Reduction of child mortality is reflected in several of the United Nations’ Sustainable Development Goals and is a key indicator of human progress. The UN expects that by 2030, countries end preventable deaths of newborns and children under 5 years of age, with all countries aiming to reduce under‑5 mortality to at least as low as 25 per 1,000 live births. Parallel to notion of child mortality is of course maternal mortality, which accounts for 295 000 deaths during and following pregnancy and childbirth (as of 2017). The vast majority of these deaths (94%) occurred in low-resource settings, and most could have been prevented. In light of what was mentioned above, Cardiotocograms (CTGs) are a simple and cost accessible option to assess fetal health, allowing healthcare professionals to take action in order to prevent child and maternal mortality. The equipment itself works by sending ultrasound pulses and reading its response, thus shedding light on fetal heart rate (FHR), fetal movements, uterine contractions and more.
The goal of this part of the homework is to apply all statistical tools studied in order to classify, in the best way possible (what is the “best” way possible will be discussed later on), a new fetus depending on various features. For this, we will use Quadratic Discriminant Analysis, Linear Discriminant Analysis, Naive Bayes and Logistic Regression.
Installing libraries
library("ggplot2")
library("MASS")
library("e1071")
library("gridExtra")
library("ggcorrplot")
library("VGAM")
## Loading required package: stats4
## Loading required package: splines
library("rrcov")
## Loading required package: robustbase
## Scalable Robust Estimators with High Breakdown Point (version 1.5-5)
library("bestNormalize")
##
## Attaching package: 'bestNormalize'
## The following object is masked from 'package:MASS':
##
## boxcox
library("outliers")
library("EnvStats")
## Registered S3 method overwritten by 'EnvStats':
## method from
## plot.boxcox bestNormalize
##
## Attaching package: 'EnvStats'
## The following object is masked from 'package:bestNormalize':
##
## boxcox
## The following object is masked from 'package:rrcov':
##
## predict
## The following objects are masked from 'package:VGAM':
##
## calibrate, dpareto, ppareto, predict, qpareto, rpareto
## The following objects are masked from 'package:e1071':
##
## kurtosis, skewness
## The following object is masked from 'package:MASS':
##
## boxcox
## The following objects are masked from 'package:stats':
##
## predict, predict.lm
## The following object is masked from 'package:base':
##
## print.default
library("dplyr")
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
##
## combine
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("plotrix")
library("caret")
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:VGAM':
##
## predictors
library("tidyverse")
## ── Attaching packages ───────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble 3.0.3 ✓ purrr 0.3.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::combine() masks gridExtra::combine()
## x tidyr::fill() masks VGAM::fill()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x purrr::lift() masks caret::lift()
## x dplyr::select() masks MASS::select()
library("penalizedLDA")
library("bnclassify")
##
## Attaching package: 'bnclassify'
## The following object is masked from 'package:dplyr':
##
## vars
## The following object is masked from 'package:EnvStats':
##
## cv
## The following object is masked from 'package:ggplot2':
##
## vars
library("naivebayes")
## naivebayes 0.9.7 loaded
library("MLeval")
library("caTools")
library("factoextra")
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library("FactoMineR")
library("paran")
####Defining Colours
color_1 <- "blue"
color_2 <- "red"
color_3 <- "yellow"
This dataset contains 2126 records of features extracted from Cardiotocogram exams, which were then classified by three expert obstetritians into 3 classes: Normal (0) Suspect (1) Pathological (2)
First, let´s read the data and study it.
data <- read.csv("fetal_health.csv")
dim(data)
## [1] 2126 22
head(data)
## baseline.value accelerations fetal_movement uterine_contractions
## 1 120 0.000 0 0.000
## 2 132 0.006 0 0.006
## 3 133 0.003 0 0.008
## 4 134 0.003 0 0.008
## 5 132 0.007 0 0.008
## 6 134 0.001 0 0.010
## light_decelerations severe_decelerations prolongued_decelerations
## 1 0.000 0 0.000
## 2 0.003 0 0.000
## 3 0.003 0 0.000
## 4 0.003 0 0.000
## 5 0.000 0 0.000
## 6 0.009 0 0.002
## abnormal_short_term_variability mean_value_of_short_term_variability
## 1 73 0.5
## 2 17 2.1
## 3 16 2.1
## 4 16 2.4
## 5 16 2.4
## 6 26 5.9
## percentage_of_time_with_abnormal_long_term_variability
## 1 43
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## mean_value_of_long_term_variability histogram_width histogram_min
## 1 2.4 64 62
## 2 10.4 130 68
## 3 13.4 130 68
## 4 23.0 117 53
## 5 19.9 117 53
## 6 0.0 150 50
## histogram_max histogram_number_of_peaks histogram_number_of_zeroes
## 1 126 2 0
## 2 198 6 1
## 3 198 5 1
## 4 170 11 0
## 5 170 9 0
## 6 200 5 3
## histogram_mode histogram_mean histogram_median histogram_variance
## 1 120 137 121 73
## 2 141 136 140 12
## 3 141 135 138 13
## 4 137 134 137 13
## 5 137 136 138 11
## 6 76 107 107 170
## histogram_tendency fetal_health
## 1 1 2
## 2 0 1
## 3 0 1
## 4 1 1
## 5 1 1
## 6 0 3
summary(data)
## baseline.value accelerations fetal_movement uterine_contractions
## Min. :106.0 Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:126.0 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.002000
## Median :133.0 Median :0.002000 Median :0.000000 Median :0.004000
## Mean :133.3 Mean :0.003178 Mean :0.009481 Mean :0.004366
## 3rd Qu.:140.0 3rd Qu.:0.006000 3rd Qu.:0.003000 3rd Qu.:0.007000
## Max. :160.0 Max. :0.019000 Max. :0.481000 Max. :0.015000
## light_decelerations severe_decelerations prolongued_decelerations
## Min. :0.000000 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0.000000 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0.000000 Median :0.000e+00 Median :0.0000000
## Mean :0.001889 Mean :3.293e-06 Mean :0.0001585
## 3rd Qu.:0.003000 3rd Qu.:0.000e+00 3rd Qu.:0.0000000
## Max. :0.015000 Max. :1.000e-03 Max. :0.0050000
## abnormal_short_term_variability mean_value_of_short_term_variability
## Min. :12.00 Min. :0.200
## 1st Qu.:32.00 1st Qu.:0.700
## Median :49.00 Median :1.200
## Mean :46.99 Mean :1.333
## 3rd Qu.:61.00 3rd Qu.:1.700
## Max. :87.00 Max. :7.000
## percentage_of_time_with_abnormal_long_term_variability
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 0.000
## Mean : 9.847
## 3rd Qu.:11.000
## Max. :91.000
## mean_value_of_long_term_variability histogram_width histogram_min
## Min. : 0.000 Min. : 3.00 Min. : 50.00
## 1st Qu.: 4.600 1st Qu.: 37.00 1st Qu.: 67.00
## Median : 7.400 Median : 67.50 Median : 93.00
## Mean : 8.188 Mean : 70.45 Mean : 93.58
## 3rd Qu.:10.800 3rd Qu.:100.00 3rd Qu.:120.00
## Max. :50.700 Max. :180.00 Max. :159.00
## histogram_max histogram_number_of_peaks histogram_number_of_zeroes
## Min. :122 Min. : 0.000 Min. : 0.0000
## 1st Qu.:152 1st Qu.: 2.000 1st Qu.: 0.0000
## Median :162 Median : 3.000 Median : 0.0000
## Mean :164 Mean : 4.068 Mean : 0.3236
## 3rd Qu.:174 3rd Qu.: 6.000 3rd Qu.: 0.0000
## Max. :238 Max. :18.000 Max. :10.0000
## histogram_mode histogram_mean histogram_median histogram_variance
## Min. : 60.0 Min. : 73.0 Min. : 77.0 Min. : 0.00
## 1st Qu.:129.0 1st Qu.:125.0 1st Qu.:129.0 1st Qu.: 2.00
## Median :139.0 Median :136.0 Median :139.0 Median : 7.00
## Mean :137.5 Mean :134.6 Mean :138.1 Mean : 18.81
## 3rd Qu.:148.0 3rd Qu.:145.0 3rd Qu.:148.0 3rd Qu.: 24.00
## Max. :187.0 Max. :182.0 Max. :186.0 Max. :269.00
## histogram_tendency fetal_health
## Min. :-1.0000 Min. :1.000
## 1st Qu.: 0.0000 1st Qu.:1.000
## Median : 0.0000 Median :1.000
## Mean : 0.3203 Mean :1.304
## 3rd Qu.: 1.0000 3rd Qu.:1.000
## Max. : 1.0000 Max. :3.000
We have to check if there are any missing values in our data.
n_miss <- apply(is.na(data),1,sum)
table(n_miss)
## n_miss
## 0
## 2126
As we can see, there are no missing values.
Let´s check the variables Description
lapply(data,class)
## $baseline.value
## [1] "numeric"
##
## $accelerations
## [1] "numeric"
##
## $fetal_movement
## [1] "numeric"
##
## $uterine_contractions
## [1] "numeric"
##
## $light_decelerations
## [1] "numeric"
##
## $severe_decelerations
## [1] "numeric"
##
## $prolongued_decelerations
## [1] "numeric"
##
## $abnormal_short_term_variability
## [1] "numeric"
##
## $mean_value_of_short_term_variability
## [1] "numeric"
##
## $percentage_of_time_with_abnormal_long_term_variability
## [1] "numeric"
##
## $mean_value_of_long_term_variability
## [1] "numeric"
##
## $histogram_width
## [1] "numeric"
##
## $histogram_min
## [1] "numeric"
##
## $histogram_max
## [1] "numeric"
##
## $histogram_number_of_peaks
## [1] "numeric"
##
## $histogram_number_of_zeroes
## [1] "numeric"
##
## $histogram_mode
## [1] "numeric"
##
## $histogram_mean
## [1] "numeric"
##
## $histogram_median
## [1] "numeric"
##
## $histogram_variance
## [1] "numeric"
##
## $histogram_tendency
## [1] "numeric"
##
## $fetal_health
## [1] "numeric"
This is a categorical variable, as it clasifies the overall health of the fetus. It takes value: 1- normal helath (0) 2- suspect health (1) 3- dangerous health (2)
summary(data$fetal_health)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 1.304 1.000 3.000
target.freq <- table(data$fetal_health)
target.freq
##
## 1 2 3
## 1655 295 176
target.freq.rel <- target.freq/nrow(data)
target.freq.rel
##
## 1 2 3
## 0.77845720 0.13875823 0.08278457
We can see that the classes are pretty unbalanced. This may become problematic with bayesian classifiers, as it is best if classes are balanced. We also see there is a clear predominancie of healthy fetus, and a minority of suspect and even less pathological fetus. Actually, almost 78% of the fetus in our data are considered healthy, while almost 14% suspicious and just 8% are in danger. This is reasonable: is we consider this a random sample, then most of the fetus are usually healthier, while it is not so common to find suspected or problematic fetus. After going through most of the research, I decided to go back to the beggining and start again with a target variable of only two levels. This is because there was too much noise on the models and very little information about some subgroups. After talking with a neonathologist, I decided that merging the suspicious and dangerous fetus is not a bad idea. This way the classes will be more balanced, and in practical effects, both suspisious and dangerous fetus would have to be subjected to deeper check.
data$fetal_health = factor(ifelse(data$fetal_health == 1, "Healthy" , "Warning"))
target.freq <- table(data$fetal_health)
target.freq
##
## Healthy Warning
## 1655 471
target.freq.rel <- target.freq/nrow(data)
target.freq.rel
##
## Healthy Warning
## 0.7784572 0.2215428
lbles <- paste(names(target.freq), "\n", target.freq, sep="")
pie3D(target.freq,labels=lbles,explode=0.1,main="Pie Chart of Fetal Health",)
ggplot(data=data, aes(x = fetal_health)) + geom_bar(aes(fill = fetal_health)) + ggtitle("Bar Plot for Fetal Health")
We can see that the classes are pretty unbalanced. This may become problematic with bayesian classifiers, as it is best if classes are balanced. We also see there is a clear predominancie of healthy fetus, and a minority of suspect and even less pathological fetus. Actually, almost 78% of the fetus in our data are considered healthy, while almost 14% suspicious and just 8% are in danger. This is reasonable: is we consider this a random sample, then most of the fetus are usually healthier, while it is not so common to find suspected or problematic fetus.
features <- setdiff(colnames(data),'fetal_health')
features
## [1] "baseline.value"
## [2] "accelerations"
## [3] "fetal_movement"
## [4] "uterine_contractions"
## [5] "light_decelerations"
## [6] "severe_decelerations"
## [7] "prolongued_decelerations"
## [8] "abnormal_short_term_variability"
## [9] "mean_value_of_short_term_variability"
## [10] "percentage_of_time_with_abnormal_long_term_variability"
## [11] "mean_value_of_long_term_variability"
## [12] "histogram_width"
## [13] "histogram_min"
## [14] "histogram_max"
## [15] "histogram_number_of_peaks"
## [16] "histogram_number_of_zeroes"
## [17] "histogram_mode"
## [18] "histogram_mean"
## [19] "histogram_median"
## [20] "histogram_variance"
## [21] "histogram_tendency"
We have 21 features, described as follows. 1) Baseline Value 2) Accelerations 3) Fetal Movement 4) Uterine Contractions 5) Light Decelerations 6) Severe decelerations 7) Prolongued Decelerations 8) Abnormal Short Term Variability 9) Mean Value of Short Term Variability 10) Percentage of time with Abnormal Long Term Variability 11) Mean Value of Long term Variability 12) Histogram Width 13) Histogram Min 14) Histogram Max 15) Histogram Number of Peaks 16) Histogram Number of Zeroes 17) Histogram Mode 18) Histogram Mean 19) Histogram Median 20) Histogram Variance 21) Histogram Tendency
Let´s explore their histograms
for (f in features) {
hist(data[,f], main=f)
}
Let´s study the problem in one dimension, to see if there is any single feature that helps us distinguish between the tree classes.
dat <- data
dat$fetal_health <- as.factor(dat$fetal_health)
univar_graph <- function(univar_name, univar, data, output_var) {
g_1 <- ggplot(data, aes(x=univar)) + geom_density() + xlab(univar_name)
g_2 <- ggplot(data, aes(x=univar, fill=output_var)) + geom_density(alpha=0.4) + xlab(univar_name)
grid.arrange(g_1, g_2, ncol=2, top=paste(univar_name,"variable", "/ [ Skew:",skewness(univar),"]"))
}
for (x in 1:(ncol(data)-1)) {
univar_graph(names(data)[x], data[,x], data, dat[,'fetal_health'])
}
We can see that variable “accelerations” distinguish a bit the two groups of interest.Also, “severe decelerations” look like it also distinguishes the two groups (as if the value it takes is different to zero, it looks like it is always talking about a warning fetus). The same happens with prolongued decelerations.Most of the other variables tend to be pretty similar in distribution between groups.
It would be interesting to perform the same analysis, but in a two dimessional perspective. unfortunatly ploting 21 quantitative variables with each other would result in a combination of: (21*21)-21 = 420 scatterplots. So we would have to analyze 110 scatterplots to see this properly. As this is not very reasonable, one good way of studying what variables may be influential to classify our data, we can use principal component analysis (this will be used later on).
There are some variables that get a lot (most of the observations) of zeros. For instance, most of the cases are of healthy fetus, so the “severe decelerations” are almost not seen in the data. The difficulty arises when we combine continuous variables with variables that take almost always the same value. This can be a problem, and can be caracterized as zero and near-zero variance predictors.
nzv <- nearZeroVar(data, saveMetrics= TRUE)
nzv[nzv$nzv,][1:21,]
## freqRatio percentUnique
## severe_decelerations 302.71429 0.09407338
## prolongued_decelerations 27.05556 0.28222013
## percentage_of_time_with_abnormal_long_term_variability 23.84615 4.09219191
## NA NA NA
## NA.1 NA NA
## NA.2 NA NA
## NA.3 NA NA
## NA.4 NA NA
## NA.5 NA NA
## NA.6 NA NA
## NA.7 NA NA
## NA.8 NA NA
## NA.9 NA NA
## NA.10 NA NA
## NA.11 NA NA
## NA.12 NA NA
## NA.13 NA NA
## NA.14 NA NA
## NA.15 NA NA
## NA.16 NA NA
## NA.17 NA NA
## zeroVar nzv
## severe_decelerations FALSE TRUE
## prolongued_decelerations FALSE TRUE
## percentage_of_time_with_abnormal_long_term_variability FALSE TRUE
## NA NA NA
## NA.1 NA NA
## NA.2 NA NA
## NA.3 NA NA
## NA.4 NA NA
## NA.5 NA NA
## NA.6 NA NA
## NA.7 NA NA
## NA.8 NA NA
## NA.9 NA NA
## NA.10 NA NA
## NA.11 NA NA
## NA.12 NA NA
## NA.13 NA NA
## NA.14 NA NA
## NA.15 NA NA
## NA.16 NA NA
## NA.17 NA NA
We can see that severe_decelerations may be a higly problematic variable in terms of variability. After considering and trying different options, I decided to keep it. This is because most models run with all this, and the truth is that “severe decelerations” almost does not vary because for healthy fetus is always takes value “0” and changes only for very extreme cases of unhelathy fetus. Having very unbalanced classes (still after combining suspicious and dangerous fetus), it would not be a very good idea to take out this variable, knowing that it has to do with very dangerous fetus. If i take it out from the models, then we would lose really valuable information about unhealthy fetus.
Let´s check for correlations, as this may also become a problem. If there are variables that are extremely linearly correlated, then our models will strike.
R <- cor(data[,1:21])
R
## baseline.value
## baseline.value 1.000000000
## accelerations -0.080559728
## fetal_movement -0.033436480
## uterine_contractions -0.146373271
## light_decelerations -0.159031878
## severe_decelerations -0.053517550
## prolongued_decelerations -0.104597095
## abnormal_short_term_variability 0.305570030
## mean_value_of_short_term_variability -0.279606597
## percentage_of_time_with_abnormal_long_term_variability 0.285629578
## mean_value_of_long_term_variability -0.032091023
## histogram_width -0.147678762
## histogram_min 0.361619496
## histogram_max 0.275109794
## histogram_number_of_peaks -0.113933323
## histogram_number_of_zeroes -0.004744583
## histogram_mode 0.708992500
## histogram_mean 0.723121038
## histogram_median 0.789246407
## histogram_variance -0.133938054
## histogram_tendency 0.293503485
## accelerations
## baseline.value -0.080559728
## accelerations 1.000000000
## fetal_movement 0.048234576
## uterine_contractions 0.089674232
## light_decelerations -0.108614752
## severe_decelerations -0.043018117
## prolongued_decelerations -0.127748624
## abnormal_short_term_variability -0.279577473
## mean_value_of_short_term_variability 0.207169746
## percentage_of_time_with_abnormal_long_term_variability -0.373943010
## mean_value_of_long_term_variability -0.142363119
## histogram_width 0.298631025
## histogram_min -0.154286330
## histogram_max 0.394146773
## histogram_number_of_peaks 0.190451945
## histogram_number_of_zeroes -0.006146567
## histogram_mode 0.243609990
## histogram_mean 0.270334428
## histogram_median 0.272849129
## histogram_variance 0.125703963
## histogram_tendency 0.028419837
## fetal_movement
## baseline.value -0.03343648
## accelerations 0.04823458
## fetal_movement 1.00000000
## uterine_contractions -0.06877872
## light_decelerations 0.04922850
## severe_decelerations -0.01097563
## prolongued_decelerations 0.26592207
## abnormal_short_term_variability -0.10371542
## mean_value_of_short_term_variability 0.12131421
## percentage_of_time_with_abnormal_long_term_variability -0.07409550
## mean_value_of_long_term_variability 0.01104743
## histogram_width 0.16279008
## histogram_min -0.15391727
## histogram_max 0.09985259
## histogram_number_of_peaks 0.16465407
## histogram_number_of_zeroes -0.01774917
## histogram_mode -0.06119241
## histogram_mean -0.08967121
## histogram_median -0.07232949
## histogram_variance 0.17933965
## histogram_tendency -0.00154140
## uterine_contractions
## baseline.value -0.146373271
## accelerations 0.089674232
## fetal_movement -0.068778721
## uterine_contractions 1.000000000
## light_decelerations 0.285079038
## severe_decelerations 0.006788278
## prolongued_decelerations 0.077036104
## abnormal_short_term_variability -0.232810664
## mean_value_of_short_term_variability 0.289678600
## percentage_of_time_with_abnormal_long_term_variability -0.306607677
## mean_value_of_long_term_variability -0.066058165
## histogram_width 0.142541039
## histogram_min -0.113323117
## histogram_max 0.122765666
## histogram_number_of_peaks 0.082692963
## histogram_number_of_zeroes 0.057894493
## histogram_mode -0.104853944
## histogram_mean -0.187504748
## histogram_median -0.140287407
## histogram_variance 0.238581881
## histogram_tendency -0.072313551
## light_decelerations
## baseline.value -1.590319e-01
## accelerations -1.086148e-01
## fetal_movement 4.922850e-02
## uterine_contractions 2.850790e-01
## light_decelerations 1.000000e+00
## severe_decelerations 1.075730e-01
## prolongued_decelerations 2.256113e-01
## abnormal_short_term_variability -1.191518e-01
## mean_value_of_short_term_variability 5.621699e-01
## percentage_of_time_with_abnormal_long_term_variability -2.712825e-01
## mean_value_of_long_term_variability -2.429321e-01
## histogram_width 5.204674e-01
## histogram_min -5.535336e-01
## histogram_max 2.180426e-01
## histogram_number_of_peaks 3.976204e-01
## histogram_number_of_zeroes 2.352958e-01
## histogram_mode -3.472325e-01
## histogram_mean -5.273544e-01
## histogram_median -3.885856e-01
## histogram_variance 5.642894e-01
## histogram_tendency 7.161302e-05
## severe_decelerations
## baseline.value -0.053517550
## accelerations -0.043018117
## fetal_movement -0.010975631
## uterine_contractions 0.006788278
## light_decelerations 0.107572979
## severe_decelerations 1.000000000
## prolongued_decelerations 0.012395431
## abnormal_short_term_variability 0.033948594
## mean_value_of_short_term_variability 0.034129856
## percentage_of_time_with_abnormal_long_term_variability -0.030770213
## mean_value_of_long_term_variability -0.037666897
## histogram_width 0.044879585
## histogram_min -0.071973814
## histogram_max -0.021134754
## histogram_number_of_peaks 0.007024261
## histogram_number_of_zeroes 0.043441387
## histogram_mode -0.215160740
## histogram_mean -0.158672869
## histogram_median -0.160451257
## histogram_variance 0.136420521
## histogram_tendency -0.070483116
## prolongued_decelerations
## baseline.value -0.10459709
## accelerations -0.12774862
## fetal_movement 0.26592207
## uterine_contractions 0.07703610
## light_decelerations 0.22561133
## severe_decelerations 0.01239543
## prolongued_decelerations 1.00000000
## abnormal_short_term_variability 0.04622569
## mean_value_of_short_term_variability 0.26701062
## percentage_of_time_with_abnormal_long_term_variability -0.13733337
## mean_value_of_long_term_variability -0.22651353
## histogram_width 0.26539100
## histogram_min -0.27676450
## histogram_max 0.12022129
## histogram_number_of_peaks 0.22285969
## histogram_number_of_zeroes 0.05642320
## histogram_mode -0.43641595
## histogram_mean -0.48866344
## histogram_median -0.44477787
## histogram_variance 0.50330146
## histogram_tendency -0.21540452
## abnormal_short_term_variability
## baseline.value 0.305570030
## accelerations -0.279577473
## fetal_movement -0.103715419
## uterine_contractions -0.232810664
## light_decelerations -0.119151839
## severe_decelerations 0.033948594
## prolongued_decelerations 0.046225691
## abnormal_short_term_variability 1.000000000
## mean_value_of_short_term_variability -0.430704976
## percentage_of_time_with_abnormal_long_term_variability 0.459412717
## mean_value_of_long_term_variability -0.315104671
## histogram_width -0.260462577
## histogram_min 0.275377994
## histogram_max -0.111806050
## histogram_number_of_peaks -0.167561291
## histogram_number_of_zeroes -0.149296431
## histogram_mode 0.058363094
## histogram_mean 0.074553692
## histogram_median 0.119960169
## histogram_variance -0.146433817
## histogram_tendency -0.005747923
## mean_value_of_short_term_variability
## baseline.value -0.27960660
## accelerations 0.20716975
## fetal_movement 0.12131421
## uterine_contractions 0.28967860
## light_decelerations 0.56216989
## severe_decelerations 0.03412986
## prolongued_decelerations 0.26701062
## abnormal_short_term_variability -0.43070498
## mean_value_of_short_term_variability 1.00000000
## percentage_of_time_with_abnormal_long_term_variability -0.47025888
## mean_value_of_long_term_variability 0.07389177
## histogram_width 0.66084678
## histogram_min -0.62256925
## histogram_max 0.40907212
## histogram_number_of_peaks 0.50142950
## histogram_number_of_zeroes 0.26618303
## histogram_mode -0.30758650
## histogram_mean -0.44540138
## histogram_median -0.33610928
## histogram_variance 0.55585244
## histogram_tendency -0.06613969
## percentage_of_time_with_abnormal_long_term_variability
## baseline.value 0.28562958
## accelerations -0.37394301
## fetal_movement -0.07409550
## uterine_contractions -0.30660768
## light_decelerations -0.27128246
## severe_decelerations -0.03077021
## prolongued_decelerations -0.13733337
## abnormal_short_term_variability 0.45941272
## mean_value_of_short_term_variability -0.47025888
## percentage_of_time_with_abnormal_long_term_variability 1.00000000
## mean_value_of_long_term_variability -0.17111405
## histogram_width -0.45129650
## histogram_min 0.42283438
## histogram_max -0.28318335
## histogram_number_of_peaks -0.27930068
## histogram_number_of_zeroes -0.12178368
## histogram_mode 0.16521142
## histogram_mean 0.22232062
## histogram_median 0.18648019
## histogram_variance -0.28153615
## histogram_tendency 0.04248124
## mean_value_of_long_term_variability
## baseline.value -0.032091023
## accelerations -0.142363119
## fetal_movement 0.011047430
## uterine_contractions -0.066058165
## light_decelerations -0.242932065
## severe_decelerations -0.037666897
## prolongued_decelerations -0.226513529
## abnormal_short_term_variability -0.315104671
## mean_value_of_short_term_variability 0.073891775
## percentage_of_time_with_abnormal_long_term_variability -0.171114053
## mean_value_of_long_term_variability 1.000000000
## histogram_width 0.110942057
## histogram_min -0.144976343
## histogram_max 0.002022563
## histogram_number_of_peaks 0.056357459
## histogram_number_of_zeroes 0.123869099
## histogram_mode 0.072070968
## histogram_mean 0.137813332
## histogram_median 0.063227533
## histogram_variance -0.164079171
## histogram_tendency 0.153092953
## histogram_width
## baseline.value -0.14767876
## accelerations 0.29863103
## fetal_movement 0.16279008
## uterine_contractions 0.14254104
## light_decelerations 0.52046737
## severe_decelerations 0.04487958
## prolongued_decelerations 0.26539100
## abnormal_short_term_variability -0.26046258
## mean_value_of_short_term_variability 0.66084678
## percentage_of_time_with_abnormal_long_term_variability -0.45129650
## mean_value_of_long_term_variability 0.11094206
## histogram_width 1.00000000
## histogram_min -0.89851896
## histogram_max 0.69076879
## histogram_number_of_peaks 0.74707094
## histogram_number_of_zeroes 0.31727611
## histogram_mode -0.15926065
## histogram_mean -0.28084632
## histogram_median -0.16885434
## histogram_variance 0.61588381
## histogram_tendency 0.11815183
## histogram_min
## baseline.value 0.36161950
## accelerations -0.15428633
## fetal_movement -0.15391727
## uterine_contractions -0.11332312
## light_decelerations -0.55353359
## severe_decelerations -0.07197381
## prolongued_decelerations -0.27676450
## abnormal_short_term_variability 0.27537799
## mean_value_of_short_term_variability -0.62256925
## percentage_of_time_with_abnormal_long_term_variability 0.42283438
## mean_value_of_long_term_variability -0.14497634
## histogram_width -0.89851896
## histogram_min 1.00000000
## histogram_max -0.30328584
## histogram_number_of_peaks -0.67028685
## histogram_number_of_zeroes -0.30656707
## histogram_mode 0.35306685
## histogram_mean 0.48612117
## histogram_median 0.40019105
## histogram_variance -0.54509064
## histogram_tendency -0.24257913
## histogram_max
## baseline.value 0.275109794
## accelerations 0.394146773
## fetal_movement 0.099852590
## uterine_contractions 0.122765666
## light_decelerations 0.218042634
## severe_decelerations -0.021134754
## prolongued_decelerations 0.120221292
## abnormal_short_term_variability -0.111806050
## mean_value_of_short_term_variability 0.409072124
## percentage_of_time_with_abnormal_long_term_variability -0.283183354
## mean_value_of_long_term_variability 0.002022563
## histogram_width 0.690768790
## histogram_min -0.303285839
## histogram_max 1.000000000
## histogram_number_of_peaks 0.517652125
## histogram_number_of_zeroes 0.183765568
## histogram_mode 0.235877114
## histogram_mean 0.191108278
## histogram_median 0.292679501
## histogram_variance 0.439094134
## histogram_tendency -0.143110677
## histogram_number_of_peaks
## baseline.value -0.113933323
## accelerations 0.190451945
## fetal_movement 0.164654071
## uterine_contractions 0.082692963
## light_decelerations 0.397620358
## severe_decelerations 0.007024261
## prolongued_decelerations 0.222859690
## abnormal_short_term_variability -0.167561291
## mean_value_of_short_term_variability 0.501429501
## percentage_of_time_with_abnormal_long_term_variability -0.279300683
## mean_value_of_long_term_variability 0.056357459
## histogram_width 0.747070940
## histogram_min -0.670286852
## histogram_max 0.517652125
## histogram_number_of_peaks 1.000000000
## histogram_number_of_zeroes 0.288592536
## histogram_mode -0.100123321
## histogram_mean -0.217242336
## histogram_median -0.122844095
## histogram_variance 0.454353980
## histogram_tendency 0.110114103
## histogram_number_of_zeroes
## baseline.value -0.004744583
## accelerations -0.006146567
## fetal_movement -0.017749174
## uterine_contractions 0.057894493
## light_decelerations 0.235295786
## severe_decelerations 0.043441387
## prolongued_decelerations 0.056423204
## abnormal_short_term_variability -0.149296431
## mean_value_of_short_term_variability 0.266183026
## percentage_of_time_with_abnormal_long_term_variability -0.121783680
## mean_value_of_long_term_variability 0.123869099
## histogram_width 0.317276110
## histogram_min -0.306567072
## histogram_max 0.183765568
## histogram_number_of_peaks 0.288592536
## histogram_number_of_zeroes 1.000000000
## histogram_mode -0.057978109
## histogram_mean -0.084417291
## histogram_median -0.052896401
## histogram_variance 0.196746881
## histogram_tendency 0.084694222
## histogram_mode
## baseline.value 0.70899250
## accelerations 0.24360999
## fetal_movement -0.06119241
## uterine_contractions -0.10485394
## light_decelerations -0.34723254
## severe_decelerations -0.21516074
## prolongued_decelerations -0.43641595
## abnormal_short_term_variability 0.05836309
## mean_value_of_short_term_variability -0.30758650
## percentage_of_time_with_abnormal_long_term_variability 0.16521142
## mean_value_of_long_term_variability 0.07207097
## histogram_width -0.15926065
## histogram_min 0.35306685
## histogram_max 0.23587711
## histogram_number_of_peaks -0.10012332
## histogram_number_of_zeroes -0.05797811
## histogram_mode 1.00000000
## histogram_mean 0.89341238
## histogram_median 0.93339916
## histogram_variance -0.31276475
## histogram_tendency 0.41556388
## histogram_mean
## baseline.value 0.72312104
## accelerations 0.27033443
## fetal_movement -0.08967121
## uterine_contractions -0.18750475
## light_decelerations -0.52735437
## severe_decelerations -0.15867287
## prolongued_decelerations -0.48866344
## abnormal_short_term_variability 0.07455369
## mean_value_of_short_term_variability -0.44540138
## percentage_of_time_with_abnormal_long_term_variability 0.22232062
## mean_value_of_long_term_variability 0.13781333
## histogram_width -0.28084632
## histogram_min 0.48612117
## histogram_max 0.19110828
## histogram_number_of_peaks -0.21724234
## histogram_number_of_zeroes -0.08441729
## histogram_mode 0.89341238
## histogram_mean 1.00000000
## histogram_median 0.94825134
## histogram_variance -0.39942323
## histogram_tendency 0.32707559
## histogram_median
## baseline.value 0.78924641
## accelerations 0.27284913
## fetal_movement -0.07232949
## uterine_contractions -0.14028741
## light_decelerations -0.38858559
## severe_decelerations -0.16045126
## prolongued_decelerations -0.44477787
## abnormal_short_term_variability 0.11996017
## mean_value_of_short_term_variability -0.33610928
## percentage_of_time_with_abnormal_long_term_variability 0.18648019
## mean_value_of_long_term_variability 0.06322753
## histogram_width -0.16885434
## histogram_min 0.40019105
## histogram_max 0.29267950
## histogram_number_of_peaks -0.12284410
## histogram_number_of_zeroes -0.05289640
## histogram_mode 0.93339916
## histogram_mean 0.94825134
## histogram_median 1.00000000
## histogram_variance -0.29410467
## histogram_tendency 0.38920975
## histogram_variance
## baseline.value -0.13393805
## accelerations 0.12570396
## fetal_movement 0.17933965
## uterine_contractions 0.23858188
## light_decelerations 0.56428941
## severe_decelerations 0.13642052
## prolongued_decelerations 0.50330146
## abnormal_short_term_variability -0.14643382
## mean_value_of_short_term_variability 0.55585244
## percentage_of_time_with_abnormal_long_term_variability -0.28153615
## mean_value_of_long_term_variability -0.16407917
## histogram_width 0.61588381
## histogram_min -0.54509064
## histogram_max 0.43909413
## histogram_number_of_peaks 0.45435398
## histogram_number_of_zeroes 0.19674688
## histogram_mode -0.31276475
## histogram_mean -0.39942323
## histogram_median -0.29410467
## histogram_variance 1.00000000
## histogram_tendency -0.07801258
## histogram_tendency
## baseline.value 2.935035e-01
## accelerations 2.841984e-02
## fetal_movement -1.541400e-03
## uterine_contractions -7.231355e-02
## light_decelerations 7.161302e-05
## severe_decelerations -7.048312e-02
## prolongued_decelerations -2.154045e-01
## abnormal_short_term_variability -5.747923e-03
## mean_value_of_short_term_variability -6.613969e-02
## percentage_of_time_with_abnormal_long_term_variability 4.248124e-02
## mean_value_of_long_term_variability 1.530930e-01
## histogram_width 1.181518e-01
## histogram_min -2.425791e-01
## histogram_max -1.431107e-01
## histogram_number_of_peaks 1.101141e-01
## histogram_number_of_zeroes 8.469422e-02
## histogram_mode 4.155639e-01
## histogram_mean 3.270756e-01
## histogram_median 3.892097e-01
## histogram_variance -7.801258e-02
## histogram_tendency 1.000000e+00
ggcorrplot(R,lab=TRUE)
Now, I will proceed to take out some variables that have a perfect colinearity with others, meaning they have either 1 or very close to 1 correlation. When deciding which of the two variables to take out (between a pair of highly correlated variables) I decided to take out those who did not vary a lot between healthy and warning fetus.
X = subset(data, select = -c(histogram_median,histogram_mode,histogram_min,histogram_variance,histogram_mean))
dim(X)
## [1] 2126 17
X_matr <- data.matrix(X)
comboInfo <- findLinearCombos(X_matr)
comboInfo
## $linearCombos
## list()
##
## $remove
## NULL
There are no columns that are a linear combination of others.
Now I will transform the variables, to make them more symetrical and close to a gaussian. Also, when diong this, sometimes some outliers dissapear, as variability shrinkages.
Let´s transform the variables into gaussians when possible.
X_trans <- X
baseline_value_norm <- bestNormalize(X_trans$baseline.value,out_of_sample = TRUE)
baseline_value_norm
## Best Normalizing transformation with 2126 Observations
## Estimated Normality Statistics (Pearson P / df, lower => more normal):
## - arcsinh(x): 2.7836
## - Box-Cox: 2.6679
## - Exp(x): 230.5445
## - Log_b(x+a): 2.7836
## - No transform: 2.6493
## - orderNorm (ORQ): 1.8969
## - sqrt(x + a): 2.6393
## - Yeo-Johnson: 2.6679
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##
## Based off these, bestNormalize chose:
## orderNorm Transformation with 2126 nonmissing obs and ties
## - 48 unique values
## - Original quantiles:
## 0% 25% 50% 75% 100%
## 106 126 133 140 160
X_trans[,1] <- baseline_value_norm$x.t
colnames(X_trans)[1] <- "baseline_value_norm"
accelerations_norm <- bestNormalize(X_trans$accelerations,out_of_sample = TRUE)
accelerations_norm
## Best Normalizing transformation with 2126 Observations
## Estimated Normality Statistics (Pearson P / df, lower => more normal):
## - arcsinh(x): 41.2685
## - Exp(x): 41.2685
## - Log_b(x+a): 43.0878
## - No transform: 41.2685
## - orderNorm (ORQ): 41.143
## - sqrt(x + a): 42.178
## - Yeo-Johnson: 41.323
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##
## Based off these, bestNormalize chose:
## orderNorm Transformation with 2126 nonmissing obs and ties
## - 20 unique values
## - Original quantiles:
## 0% 25% 50% 75% 100%
## 0.000 0.000 0.002 0.006 0.019
X_trans[,2] <- accelerations_norm$x.t
colnames(X_trans)[2] <- "accelerations_norm"
fetal_movement_norm <- bestNormalize(X_trans$fetal_movement,out_of_sample = TRUE)
fetal_movement_norm
## Best Normalizing transformation with 2126 Observations
## Estimated Normality Statistics (Pearson P / df, lower => more normal):
## - arcsinh(x): 139.094
## - Exp(x): 138.3539
## - Log_b(x+a): 89.1098
## - No transform: 138.7304
## - orderNorm (ORQ): 89.9936
## - sqrt(x + a): 88.2207
## - Yeo-Johnson: 109.2949
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##
## Based off these, bestNormalize chose:
## Standardized sqrt(x + a) Transformation with 2126 nonmissing obs.:
## Relevant statistics:
## - a = 0
## - mean (before standardization) = 0.0402022
## - sd (before standardization) = 0.08870287
X_trans[,3] <- fetal_movement_norm$x.t
colnames(X_trans)[3] <- "fetal_movement_norm"
uterine_contractions_norm <- bestNormalize(X_trans$uterine_contractions)
uterine_contractions_norm
## Best Normalizing transformation with 2126 Observations
## Estimated Normality Statistics (Pearson P / df, lower => more normal):
## - arcsinh(x): 14.4396
## - Exp(x): 14.4396
## - Log_b(x+a): 15.809
## - No transform: 14.4396
## - orderNorm (ORQ): 14.1491
## - sqrt(x + a): 14.4808
## - Yeo-Johnson: 14.526
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##
## Based off these, bestNormalize chose:
## orderNorm Transformation with 2126 nonmissing obs and ties
## - 16 unique values
## - Original quantiles:
## 0% 25% 50% 75% 100%
## 0.000 0.002 0.004 0.007 0.015
X_trans[,4] <- uterine_contractions_norm$x.t
colnames(X_trans)[4] <- "uterine_contractions_norm"
light_decelerations_norm <- bestNormalize(X_trans$light_decelerations)
light_decelerations_norm
## Best Normalizing transformation with 2126 Observations
## Estimated Normality Statistics (Pearson P / df, lower => more normal):
## - arcsinh(x): 78.946
## - Exp(x): 78.9381
## - Log_b(x+a): 80.275
## - No transform: 78.946
## - orderNorm (ORQ): 79.191
## - sqrt(x + a): 79.8238
## - Yeo-Johnson: 79.0013
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##
## Based off these, bestNormalize chose:
## Standardized exp(x) Transformation with 2126 nonmissing obs.:
## Relevant statistics:
## - mean (before standardization) = 1.001896
## - sd (before standardization) = 0.002973372
X_trans[,5] <- light_decelerations_norm$x.t
colnames(X_trans)[5] <- "light_decelerations_norm"
severe_decelerations_norm <- bestNormalize(X_trans$severe_decelerations)
severe_decelerations_norm
## Best Normalizing transformation with 2126 Observations
## Estimated Normality Statistics (Pearson P / df, lower => more normal):
## - arcsinh(x): 144.0906
## - Exp(x): 144.0906
## - Log_b(x+a): 144.0906
## - No transform: 144.0906
## - orderNorm (ORQ): 144.0906
## - sqrt(x + a): 144.0906
## - Yeo-Johnson: 144.0906
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##
## Based off these, bestNormalize chose:
## Standardized asinh(x) Transformation with 2126 nonmissing obs.:
## Relevant statistics:
## - mean (before standardization) = 3.292568e-06
## - sd (before standardization) = 5.729983e-05
X_trans[,6] <- severe_decelerations_norm$x.t
colnames(X_trans)[6] <- "severe_decelerations_norm"
prolongued_decelerations_norm <- bestNormalize(X_trans$prolongued_decelerations)
prolongued_decelerations_norm
## Best Normalizing transformation with 2126 Observations
## Estimated Normality Statistics (Pearson P / df, lower => more normal):
## - arcsinh(x): 201.3479
## - Exp(x): 201.3479
## - Log_b(x+a): 201.93
## - No transform: 201.3479
## - orderNorm (ORQ): 201.9626
## - sqrt(x + a): 201.9626
## - Yeo-Johnson: 201.3479
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##
## Based off these, bestNormalize chose:
## Standardized asinh(x) Transformation with 2126 nonmissing obs.:
## Relevant statistics:
## - mean (before standardization) = 0.0001585135
## - sd (before standardization) = 0.0005899466
X_trans[,7] <- prolongued_decelerations_norm$x.t
colnames(X_trans)[7] <- "prolongued_decelerations_norm"
abnormal_short_term_variability_norm <- bestNormalize(X_trans$abnormal_short_term_variability)
abnormal_short_term_variability_norm
## Best Normalizing transformation with 2126 Observations
## Estimated Normality Statistics (Pearson P / df, lower => more normal):
## - arcsinh(x): 5.1521
## - Box-Cox: 3.4535
## - Exp(x): 230.6246
## - Log_b(x+a): 5.1521
## - No transform: 3.7012
## - orderNorm (ORQ): 1.3693
## - sqrt(x + a): 3.6356
## - Yeo-Johnson: 3.4894
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##
## Based off these, bestNormalize chose:
## orderNorm Transformation with 2126 nonmissing obs and ties
## - 75 unique values
## - Original quantiles:
## 0% 25% 50% 75% 100%
## 12 32 49 61 87
X_trans[,8] <- abnormal_short_term_variability_norm$x.t
colnames(X_trans)[8] <- "abnormal_short_term_variability_norm"
mean_value_of_st_var_norm <- bestNormalize(X_trans$mean_value_of_short_term_variability)
mean_value_of_st_var_norm
## Best Normalizing transformation with 2126 Observations
## Estimated Normality Statistics (Pearson P / df, lower => more normal):
## - arcsinh(x): 2.5215
## - Box-Cox: 2.156
## - Exp(x): 86.6351
## - Log_b(x+a): 2.5266
## - No transform: 5.4903
## - orderNorm (ORQ): 1.7439
## - sqrt(x + a): 2.4089
## - Yeo-Johnson: 2.1268
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##
## Based off these, bestNormalize chose:
## orderNorm Transformation with 2126 nonmissing obs and ties
## - 57 unique values
## - Original quantiles:
## 0% 25% 50% 75% 100%
## 0.2 0.7 1.2 1.7 7.0
X_trans[,9] <- mean_value_of_st_var_norm$x.t
colnames(X_trans)[9] <- "mean_value_of_st_var_norm"
perc_of_time_w_abn_long_term_var_norm <- bestNormalize(X_trans$percentage_of_time_with_abnormal_long_term_variability)
X_trans[,10] <- perc_of_time_w_abn_long_term_var_norm$x.t
colnames(X_trans)[10] <- "perc_of_time_w_abn_long_term_var_norm"
mean_value_of_lt_var_norm <- bestNormalize(X_trans$mean_value_of_long_term_variability)
X_trans[,11] <- mean_value_of_lt_var_norm$x.t
colnames(X_trans)[11] <- "mean_value_of_lt_var_norm"
histogram_width_norm <- bestNormalize(X_trans$histogram_width)
X_trans[,12] <- histogram_width_norm$x.t
colnames(X_trans)[12] <- "histogram_width_norm"
histogram_max_norm <- bestNormalize(X_trans$histogram_max)
X_trans[,13] <- histogram_max_norm$x.t
colnames(X_trans)[13] <- "histogram_max_norm"
histogram_number_piks_norm <- bestNormalize(X_trans$histogram_number_of_peaks)
X_trans[,14] <- histogram_number_piks_norm$x.t
colnames(X_trans)[14] <- "histogram_number_piks_norm"
histogram_number_zeros_norm <- bestNormalize(X_trans$histogram_number_of_zeroes)
X_trans[,15] <- histogram_number_zeros_norm$x.t
colnames(X_trans)[15] <- "histogram_number_zeros_norm"
histogram_tendency_norm <- bestNormalize(X_trans$histogram_tendency +1)
X_trans[,16] <- histogram_tendency_norm$x.t
colnames(X_trans)[16] <- "histogram_tendency_norm"
summary(X_trans)
## baseline_value_norm accelerations_norm fetal_movement_norm
## Min. :-2.939015 Min. :-0.80554 Min. :-0.4532
## 1st Qu.:-0.687125 1st Qu.:-0.80554 1st Qu.:-0.4532
## Median :-0.009432 Median : 0.06371 Median :-0.4532
## Mean : 0.000341 Mean : 0.05167 Mean : 0.0000
## 3rd Qu.: 0.628566 3rd Qu.: 0.74503 3rd Qu.: 0.1643
## Max. : 3.497088 Max. : 3.49709 Max. : 7.3655
## uterine_contractions_norm light_decelerations_norm severe_decelerations_norm
## Min. :-1.41810 Min. :-0.6375 Min. :-0.05746
## 1st Qu.:-0.67671 1st Qu.:-0.6375 1st Qu.:-0.05746
## Median :-0.14077 Median :-0.6375 Median :-0.05746
## Mean : 0.01627 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.83192 3rd Qu.: 0.3729 3rd Qu.:-0.05746
## Max. : 3.49709 Max. : 4.4453 Max. :17.39459
## prolongued_decelerations_norm abnormal_short_term_variability_norm
## Min. :-0.2687 Min. :-3.307677
## 1st Qu.:-0.2687 1st Qu.:-0.678195
## Median :-0.2687 Median : 0.008843
## Mean : 0.0000 Mean : 0.000098
## 3rd Qu.:-0.2687 3rd Qu.: 0.668581
## Max. : 8.2066 Max. : 3.497088
## mean_value_of_st_var_norm perc_of_time_w_abn_long_term_var_norm
## Min. :-2.288520 Min. :-0.6965
## 1st Qu.:-0.656832 1st Qu.:-0.6965
## Median : 0.032429 Median :-0.6965
## Mean : 0.003249 Mean : 0.0000
## 3rd Qu.: 0.645900 3rd Qu.: 0.5914
## Max. : 3.497088 Max. : 3.0076
## mean_value_of_lt_var_norm histogram_width_norm histogram_max_norm
## Min. :-1.849122 Min. :-3.307677 Min. :-3.307677
## 1st Qu.:-0.668581 1st Qu.:-0.667107 1st Qu.:-0.659028
## Median :-0.004127 Median : 0.000000 Median : 0.002948
## Mean : 0.006749 Mean : 0.000017 Mean :-0.000237
## 3rd Qu.: 0.667844 3rd Qu.: 0.670794 3rd Qu.: 0.650263
## Max. : 3.497088 Max. : 3.497088 Max. : 2.986468
## histogram_number_piks_norm histogram_number_zeros_norm histogram_tendency_norm
## Min. :-1.95715 Min. : 0.0000 Min. :-2.5754
## 1st Qu.:-0.53566 1st Qu.: 0.0000 1st Qu.:-0.3859
## Median :-0.15806 Median : 0.0000 Median :-0.3859
## Mean : 0.01012 Mean : 0.3236 Mean : 0.0000
## 3rd Qu.: 0.69912 3rd Qu.: 0.0000 3rd Qu.: 1.0109
## Max. : 3.49709 Max. :10.0000 Max. : 1.0109
## fetal_health
## Healthy:1655
## Warning: 471
##
##
##
##
By looking at the histograms and new summary of the variables after transformation, there are 4 variables that should be tested for outliers. To do this I will use the Rosner´s test. This is a very useful test, not only because it provides us with a formal test to check for outliers in each variable, but also because it can detect more than one at the same time. This test is also designed to detect problems related to masking: this is when two outliers are very close and one of them goes undetected. I will use k=4 as the number of suspected outliers in the variables.
test_3 <- rosnerTest(X_trans$fetal_movement_norm,k=10)
test_3$all.stats
## i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
## 1 0 -1.534320e-17 1.0000000 7.365482 697 7.365482 4.220179 TRUE
## 2 1 -3.466109e-03 0.9873790 7.332904 695 7.430147 4.220070 TRUE
## 3 2 -6.920144e-03 0.9746858 7.275562 694 7.471620 4.219960 TRUE
## 4 3 -1.035042e-02 0.9620061 7.267336 20 7.565115 4.219851 TRUE
## 5 4 -1.378006e-02 0.9491621 7.151231 692 7.548775 4.219742 TRUE
## 6 5 -1.715819e-02 0.9365396 7.117731 19 7.618352 4.219632 TRUE
## 7 6 -2.052370e-02 0.9238426 7.075646 696 7.681146 4.219523 TRUE
## 8 7 -2.387253e-02 0.9110992 7.050282 693 7.764417 4.219414 TRUE
## 9 8 -2.721255e-02 0.8982448 7.033325 17 7.860371 4.219304 TRUE
## 10 9 -3.054771e-02 0.8852428 6.939366 700 7.873448 4.219194 TRUE
test_6 <- rosnerTest(X_trans$severe_decelerations_norm,k=10)
test_6$all.stats
## i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
## 1 0 1.000759e-17 1.0000000 17.39459 1489 17.39459 4.220179 TRUE
## 2 1 -8.185691e-03 0.9262559 17.39459 1490 18.78831 4.220070 TRUE
## 3 2 -1.637909e-02 0.8459503 17.39459 1792 20.58156 4.219960 TRUE
## 4 3 -2.458021e-02 0.7569974 17.39459 1793 23.01088 4.219851 TRUE
## 5 4 -3.278906e-02 0.6558880 17.39459 1794 26.57067 4.219742 TRUE
## 6 5 -4.100564e-02 0.5357829 17.39459 1795 32.54229 4.219632 TRUE
## 7 6 -4.922998e-02 0.3790344 17.39459 1796 46.02174 4.219523 TRUE
## 8 7 -5.746208e-02 0.0000000 NA NA NA 4.219414 NA
## 9 8 NA NA NA NA NA 4.219304 NA
## 10 9 NA NA NA NA NA 4.219194 NA
test_7 <- rosnerTest(X_trans$prolongued_decelerations_norm,k=10)
test_7$all.stats
## i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
## 1 0 3.075717e-17 1.0000000 8.206617 705 8.206617 4.220179 TRUE
## 2 1 -3.861937e-03 0.9842497 8.206617 1755 8.341866 4.220070 TRUE
## 3 2 -7.727511e-03 0.9682126 8.206617 1756 8.484029 4.219960 TRUE
## 4 3 -1.159673e-02 0.9518742 6.511566 683 6.852967 4.219851 TRUE
## 5 4 -1.467079e-02 0.9414989 6.511566 684 6.931752 4.219742 TRUE
## 6 5 -1.774775e-02 0.9309878 6.511566 703 7.013318 4.219632 TRUE
## 7 6 -2.082762e-02 0.9203363 6.511566 1350 7.097832 4.219523 TRUE
## 8 7 -2.391039e-02 0.9095394 6.511566 1750 7.185478 4.219414 TRUE
## 9 8 -2.699607e-02 0.8985919 6.511566 1754 7.276453 4.219304 TRUE
## 10 9 -3.008467e-02 0.8874879 6.511566 1757 7.370974 4.219194 TRUE
test_16 <- rosnerTest(X_trans$histogram_number_zeros_norm,k=10)
test_16$all.stats
## i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
## 1 0 0.3236124 0.7060594 10 142 13.704779 4.220179 TRUE
## 2 1 0.3190588 0.6742779 8 439 11.391358 4.220070 TRUE
## 3 2 0.3154426 0.6535001 7 440 10.228854 4.219960 TRUE
## 4 3 0.3122939 0.6373355 5 144 7.355162 4.219851 TRUE
## 5 4 0.3100848 0.6293034 5 643 7.452551 4.219742 TRUE
## 6 5 0.3078736 0.6211517 4 90 5.944001 4.219632 TRUE
## 7 6 0.3061321 0.6160969 4 390 5.995596 4.219523 TRUE
## 8 7 0.3043889 0.6109905 3 6 4.411871 4.219414 TRUE
## 9 8 0.3031161 0.6083187 3 7 4.433340 4.219304 TRUE
## 10 9 0.3018422 0.6056300 3 391 4.455126 4.219194 TRUE
Even when these tests are throwing results as outliers, I decided I am not going to get rid of them. The reason lies on the structure of the data. This is due to the fact that most of the fetus in the sample are healthy, therefore most of them have very standard (most of the times equal to zero) measures of several variables. If I take out 10 observations that represent severe cases of unhelthy fetus, I would be extracting very useful information from my data and I would make it even more unbalanced towards healthy fetus This way, I will interpret that these values correspond to the most serious cases, the ones that are zero (or very repetead) correspond to healthy fetus.
Next step in our preprocessing of data is to standarize the data.
By standarizing we mean transforming the data to get a mean equal to zero, and standard deviation equal to one.
X_trans[1:16] <- (lapply(X_trans[1:16],scale, center=TRUE,scale=TRUE))
summary(X_trans)
## baseline_value_norm.V1 accelerations_norm.V1 fetal_movement_norm.V1
## Min. :-2.951218 Min. :-0.977989 Min. :-0.453223
## 1st Qu.:-0.690240 1st Qu.:-0.977989 1st Qu.:-0.453223
## Median :-0.009813 Median : 0.013740 Median :-0.453223
## Mean : 0.000000 Mean : 0.000000 Mean : 0.000000
## 3rd Qu.: 0.630761 3rd Qu.: 0.791053 3rd Qu.: 0.164257
## Max. : 3.510858 Max. : 3.930881 Max. : 7.365482
## uterine_contractions_norm.V1 light_decelerations_norm.V1
## Min. :-1.509392 Min. :-0.637540
## 1st Qu.:-0.729227 1st Qu.:-0.637540
## Median :-0.165253 Median :-0.637540
## Mean : 0.000000 Mean : 0.000000
## 3rd Qu.: 0.858307 3rd Qu.: 0.372930
## Max. : 3.662875 Max. : 4.445262
## severe_decelerations_norm.V1 prolongued_decelerations_norm.V1
## Min. :-0.057462 Min. :-0.268691
## 1st Qu.:-0.057462 1st Qu.:-0.268691
## Median :-0.057462 Median :-0.268691
## Mean : 0.000000 Mean : 0.000000
## 3rd Qu.:-0.057462 3rd Qu.:-0.268691
## Max. :17.394594 Max. : 8.206617
## abnormal_short_term_variability_norm.V1 mean_value_of_st_var_norm.V1
## Min. :-3.312330 Min. :-2.313644
## 1st Qu.:-0.679227 1st Qu.:-0.666382
## Median : 0.008757 Median : 0.029458
## Mean : 0.000000 Mean : 0.000000
## 3rd Qu.: 0.669404 3rd Qu.: 0.648784
## Max. : 3.501807 Max. : 3.527187
## perc_of_time_w_abn_long_term_var_norm.V1 mean_value_of_lt_var_norm.V1
## Min. :-0.6964575 Min. :-1.889534
## 1st Qu.:-0.6964575 1st Qu.:-0.687580
## Median :-0.6964575 Median :-0.011073
## Mean : 0.0000000 Mean : 0.000000
## 3rd Qu.: 0.5913592 3rd Qu.: 0.673086
## Max. : 3.0076059 Max. : 3.553650
## histogram_width_norm.V1 histogram_max_norm.V1 histogram_number_piks_norm.V1
## Min. :-3.310674 Min. :-3.313034 Min. :-2.028587
## 1st Qu.:-0.667725 1st Qu.:-0.659905 1st Qu.:-0.562792
## Median :-0.000017 Median : 0.003190 Median :-0.173418
## Mean : 0.000000 Mean : 0.000000 Mean : 0.000000
## 3rd Qu.: 0.671382 3rd Qu.: 0.651601 3rd Qu.: 0.710472
## Max. : 3.500222 Max. : 2.991758 Max. : 3.595642
## histogram_number_zeros_norm.V1 histogram_tendency_norm.V1 fetal_health
## Min. :-0.458336 Min. :-2.5754344 Healthy:1655
## 1st Qu.:-0.458336 1st Qu.:-0.3858943 Warning: 471
## Median :-0.458336 Median :-0.3858943
## Mean : 0.000000 Mean : 0.0000000
## 3rd Qu.:-0.458336 3rd Qu.: 1.0108970
## Max. :13.704779 Max. : 1.0108970
We have now 17 variables. 16 of them are numerical, had been transformed and standarized (behave more or less like gaussians, range from 0 to 1). Transformation is crucial for LDA and QDA, and standarization is crucial for LDA (as it assumes equal correlation.)
Now, let´s perform principal component analysis to check which are the predictors that better explain the variability of our data (hence, carry more information about our sample).
X_transs <- X_trans[,1:16]
Y <- X_trans [,17]
X_pcs <- prcomp(X_transs,scale=TRUE)
colors_X <- c(color_2,color_3)[1*(Y=="Healthy")+1]
par(mfrow=c(1,1))
plot(X_pcs$x[,1:2],pch=19,col=colors_X)
get_eigenvalue(X_pcs)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 4.3537266 27.2107913 27.21079
## Dim.2 1.7542996 10.9643723 38.17516
## Dim.3 1.6908935 10.5680847 48.74325
## Dim.4 1.3423103 8.3894392 57.13269
## Dim.5 1.2371720 7.7323250 64.86501
## Dim.6 1.0067872 6.2924203 71.15743
## Dim.7 0.9151083 5.7194269 76.87686
## Dim.8 0.7738885 4.8368033 81.71366
## Dim.9 0.6719093 4.1994334 85.91310
## Dim.10 0.5860738 3.6629615 89.57606
## Dim.11 0.5669214 3.5432586 93.11932
## Dim.12 0.4438006 2.7737535 95.89307
## Dim.13 0.2403512 1.5021950 97.39526
## Dim.14 0.1952423 1.2202644 98.61553
## Dim.15 0.1498119 0.9363245 99.55185
## Dim.16 0.0717034 0.4481463 100.00000
We can see, at a first glance, that our two first principal components do not explain the difference between healthy and warning fetus, as both groups are not exactly separated in the plot. Let´s do a deeper analysis. Let´s run another code and check with parallel computind and a simple plot how many components should we keep.
pca_output <- PCA(X_transs,ncp = 8, graph = FALSE)
pca_output$var$contrib
## Dim.1 Dim.2 Dim.3
## baseline_value_norm 2.047309488 2.585831e+00 30.099178
## accelerations_norm 4.788399956 4.761246e+00 2.211752
## fetal_movement_norm 0.828561539 4.303529e+00 2.271503
## uterine_contractions_norm 3.362182152 4.403995e-01 7.256322
## light_decelerations_norm 7.855496410 7.970260e+00 2.284149
## severe_decelerations_norm 0.053213303 1.325733e+00 2.011617
## prolongued_decelerations_norm 2.021322539 2.096183e+01 3.446026
## abnormal_short_term_variability_norm 6.782333849 1.952482e+01 1.760197
## mean_value_of_st_var_norm 18.231728048 2.362290e-01 1.423859
## perc_of_time_w_abn_long_term_var_norm 12.940053570 3.640952e+00 2.139699
## mean_value_of_lt_var_norm 0.002657663 2.590346e+01 2.135923
## histogram_width_norm 17.997035182 4.090731e-01 3.688097
## histogram_max_norm 8.414556761 2.032200e+00 16.502580
## histogram_number_piks_norm 11.795246186 1.829371e+00 5.252262
## histogram_number_zeros_norm 2.835652294 1.054048e-04 1.839416
## histogram_tendency_norm 0.044251060 4.074964e+00 15.677421
## Dim.4 Dim.5 Dim.6
## baseline_value_norm 0.524394820 8.592464e+00 1.297845e-03
## accelerations_norm 27.026729695 7.865195e+00 8.181894e-01
## fetal_movement_norm 10.384928452 3.095615e+01 5.425782e-01
## uterine_contractions_norm 1.983605485 2.253980e+01 4.119085e+00
## light_decelerations_norm 13.502327025 8.462885e-01 1.262634e+00
## severe_decelerations_norm 2.070396591 5.447026e-04 7.384799e+01
## prolongued_decelerations_norm 0.483312681 5.283718e+00 3.615469e+00
## abnormal_short_term_variability_norm 0.003193994 5.188202e-01 1.726015e-01
## mean_value_of_st_var_norm 0.322436055 7.354417e-02 2.417531e-01
## perc_of_time_w_abn_long_term_var_norm 2.428600697 1.902246e-02 3.016388e-04
## mean_value_of_lt_var_norm 2.753239235 1.480410e+01 1.506151e+00
## histogram_width_norm 0.218625646 5.840983e-01 1.803279e-01
## histogram_max_norm 4.443796479 5.314463e+00 2.877629e+00
## histogram_number_piks_norm 1.157970271 2.174607e+00 2.019101e-02
## histogram_number_zeros_norm 20.874057016 3.514262e-01 2.528479e+00
## histogram_tendency_norm 11.822385856 7.575661e-02 8.265326e+00
## Dim.7 Dim.8
## baseline_value_norm 0.74281555 12.49669442
## accelerations_norm 3.36059293 11.21466258
## fetal_movement_norm 5.84531079 1.80878141
## uterine_contractions_norm 0.06340299 17.02566515
## light_decelerations_norm 4.84068452 0.05694790
## severe_decelerations_norm 12.13852459 4.36831572
## prolongued_decelerations_norm 4.37162360 4.85277759
## abnormal_short_term_variability_norm 0.21692012 0.68697830
## mean_value_of_st_var_norm 0.11535043 0.23379172
## perc_of_time_w_abn_long_term_var_norm 0.93825136 0.77711914
## mean_value_of_lt_var_norm 8.85966640 15.54558483
## histogram_width_norm 0.00646173 0.03453765
## histogram_max_norm 8.45922862 2.57061832
## histogram_number_piks_norm 0.19067809 0.81924091
## histogram_number_zeros_norm 14.08660192 27.45866012
## histogram_tendency_norm 35.76388634 0.04962425
fviz_screeplot(pca_output,ncp=8,addlabels=T,barfill="blue",barcolor="red")
pca_output$eig
## eigenvalue percentage of variance cumulative percentage of variance
## comp 1 4.3537266 27.2107913 27.21079
## comp 2 1.7542996 10.9643723 38.17516
## comp 3 1.6908935 10.5680847 48.74325
## comp 4 1.3423103 8.3894392 57.13269
## comp 5 1.2371720 7.7323250 64.86501
## comp 6 1.0067872 6.2924203 71.15743
## comp 7 0.9151083 5.7194269 76.87686
## comp 8 0.7738885 4.8368033 81.71366
## comp 9 0.6719093 4.1994334 85.91310
## comp 10 0.5860738 3.6629615 89.57606
## comp 11 0.5669214 3.5432586 93.11932
## comp 12 0.4438006 2.7737535 95.89307
## comp 13 0.2403512 1.5021950 97.39526
## comp 14 0.1952423 1.2202644 98.61553
## comp 15 0.1498119 0.9363245 99.55185
## comp 16 0.0717034 0.4481463 100.00000
pca_output_ret <- paran(X_transs,seed=1,graph = TRUE)
##
## Using eigendecomposition of correlation matrix.
## Computing: 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
##
##
## Results of Horn's Parallel Analysis for component retention
## 480 iterations, using the mean estimate
##
## --------------------------------------------------
## Component Adjusted Unadjusted Estimated
## Eigenvalue Eigenvalue Bias
## --------------------------------------------------
## 1 4.203744 4.353726 0.149982
## 2 1.635174 1.754299 0.119125
## 3 1.594786 1.690893 0.096107
## 4 1.266518 1.342310 0.075791
## 5 1.180104 1.237172 0.057067
## --------------------------------------------------
##
## Adjusted eigenvalues > 1 indicate dimensions to retain.
## (5 components retained)
pca_output_ret$Retained
## [1] 5
We should stick to the first 5 principal components (explaining almost 65% of the variability of the model). We can check what are the variables that account the most to explaining this variability.
pca_output$eig[,3][1:5]
## comp 1 comp 2 comp 3 comp 4 comp 5
## 27.21079 38.17516 48.74325 57.13269 64.86501
fviz_contrib(pca_output, choice="var", axes=1,top=5)
fviz_contrib(pca_output, choice="var", axes=2,top=5)
fviz_contrib(pca_output, choice="var", axes=3,top=5)
fviz_contrib(pca_output, choice="var", axes=4,top=5)
fviz_contrib(pca_output, choice="var", axes=5,top=5)
# Check if the first two components give me any sign of subgroups.
colors_X <- c(color_2,color_3)[1*(Y=="Healthy")+1]
par(mfrow=c(1,1))
pairs(X_pcs$x[,1:5],col=colors_X,pch=19,main="The first three PCs")
We can see that component number 3 and number 2 are the ones that better explain for the difference between healthy and warning fetus.
Now, I will perform a Quadratic Discriminant Analysis (QDA), Linear Discriminant Analysis (LDA), Nayve Bayes, and Logistic Regression. All of these approaches are based in probabilities, and depend on a threshold after we predict the probabilities.
If the probability lies in an interval 0.5-1 it will be clasified as “Warning”, while if the probability lies in the interval 0-0.5 it will be classified as "Healthy.
So first, I will find each model maximizing accuracy, and discuss on the results. After that, I will consider that accuracy is not what I am chasing, but actually to minimize my cost-matrix. To do this, I will imagine this research was ordered by an organization (maybe an NGO) whose goal is to fight deaths that could be prevented. If this is the case, we may be more interested in losing some accuracy, but minimizing our cost (specially because the cost of diagnosing an unhealthy fetus as healthy is very expensive, in comparison with the other error.) This is called relative cost learning and I will apply this for the 4 models and then discuss on the result.
Let´s try first just running the models and checking the output without the confusion matrix.
qda.class.datos <- qda(fetal_health ~ ., X_trans)
qda.class.datos
## Call:
## qda(fetal_health ~ ., data = X_trans)
##
## Prior probabilities of groups:
## Healthy Warning
## 0.7784572 0.2215428
##
## Group means:
## baseline_value_norm accelerations_norm fetal_movement_norm
## Healthy -0.135757 0.2329389 -0.04374192
## Warning 0.477023 -0.8185008 0.15370038
## uterine_contractions_norm light_decelerations_norm
## Healthy 0.1395107 0.01742883
## Warning -0.4902126 -0.06124141
## severe_decelerations_norm prolongued_decelerations_norm
## Healthy -0.04691704 -0.1816333
## Warning 0.16485710 0.6382233
## abnormal_short_term_variability_norm mean_value_of_st_var_norm
## Healthy -0.2659398 0.1857047
## Warning 0.9344594 -0.6525292
## perc_of_time_w_abn_long_term_var_norm mean_value_of_lt_var_norm
## Healthy -0.2552409 0.09904268
## Warning 0.8968656 -0.34801622
## histogram_width_norm histogram_max_norm histogram_number_piks_norm
## Healthy 0.1048106 0.02712086 0.04653131
## Warning -0.3682835 -0.09529727 -0.16350172
## histogram_number_zeros_norm histogram_tendency_norm
## Healthy 0.01662044 0.04342063
## Warning -0.05840089 -0.15257144
lda.class.datos <- lda(fetal_health ~ ., X_trans)
lda.class.datos
## Call:
## lda(fetal_health ~ ., data = X_trans)
##
## Prior probabilities of groups:
## Healthy Warning
## 0.7784572 0.2215428
##
## Group means:
## baseline_value_norm accelerations_norm fetal_movement_norm
## Healthy -0.135757 0.2329389 -0.04374192
## Warning 0.477023 -0.8185008 0.15370038
## uterine_contractions_norm light_decelerations_norm
## Healthy 0.1395107 0.01742883
## Warning -0.4902126 -0.06124141
## severe_decelerations_norm prolongued_decelerations_norm
## Healthy -0.04691704 -0.1816333
## Warning 0.16485710 0.6382233
## abnormal_short_term_variability_norm mean_value_of_st_var_norm
## Healthy -0.2659398 0.1857047
## Warning 0.9344594 -0.6525292
## perc_of_time_w_abn_long_term_var_norm mean_value_of_lt_var_norm
## Healthy -0.2552409 0.09904268
## Warning 0.8968656 -0.34801622
## histogram_width_norm histogram_max_norm histogram_number_piks_norm
## Healthy 0.1048106 0.02712086 0.04653131
## Warning -0.3682835 -0.09529727 -0.16350172
## histogram_number_zeros_norm histogram_tendency_norm
## Healthy 0.01662044 0.04342063
## Warning -0.05840089 -0.15257144
##
## Coefficients of linear discriminants:
## LD1
## baseline_value_norm -0.07388097
## accelerations_norm -0.37756476
## fetal_movement_norm 0.01919919
## uterine_contractions_norm -0.24588152
## light_decelerations_norm 0.15628379
## severe_decelerations_norm 0.17057022
## prolongued_decelerations_norm 0.77598935
## abnormal_short_term_variability_norm 0.44740227
## mean_value_of_st_var_norm -0.15952718
## perc_of_time_w_abn_long_term_var_norm 0.59942242
## mean_value_of_lt_var_norm 0.04396969
## histogram_width_norm -0.26895514
## histogram_max_norm 0.50389820
## histogram_number_piks_norm -0.02005009
## histogram_number_zeros_norm 0.04025855
## histogram_tendency_norm 0.09623987
plot(lda.class.datos)
Let´s first divide our sample into a pair of sets: training sets and testing set. This is crucial to test our model, since we want to see the performance of the predictions not along the same data we used to build the model, but for unseen data that we know is part of the generator process of information. We will take 60% of the data for training, and 40% for testing. This is a first approach, as we are guessing the p=0.6, and we are not deducing it fron cross-validation.
spl = createDataPartition(X_trans$fetal_health, p = 0.8, list = FALSE)
FetusTrain = X_trans[spl,]
FetusTest = X_trans[-spl,]
table(FetusTrain$fetal_health)
##
## Healthy Warning
## 1324 377
table(FetusTest$fetal_health)
##
## Healthy Warning
## 331 94
levels(X_trans$fetal_health)
## [1] "Healthy" "Warning"
library(VGAM)
log.fit = glm (fetal_health ~ ., family=binomial(link="logit"), data=FetusTrain)
# We are not really interested in coefficients, as what we are interested in is probabilities. This is why I define an object prediction on the test set.
prob.test = predict(log.fit, newdata=FetusTest, type="response")
head(prob.test)
## 1 4 7 15 18 32
## -0.8489228 -8.0205896 1.9895590 -4.5465278 -0.7147929 -3.3631263
# Now let´s place these into a list, but in labels
pred.test <- as.factor(levels(FetusTest$fetal_health)[max.col(prob.test)])
summary(log.fit)
##
## Call:
## glm(formula = fetal_health ~ ., family = binomial(link = "logit"),
## data = FetusTrain)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3068 -0.2435 -0.0855 -0.0161 3.2663
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.130624 0.193356 -16.191 < 2e-16 ***
## baseline_value_norm -0.058318 0.188001 -0.310 0.75641
## accelerations_norm -1.748658 0.235679 -7.420 1.17e-13 ***
## fetal_movement_norm 0.003792 0.129395 0.029 0.97662
## uterine_contractions_norm -0.614438 0.123108 -4.991 6.01e-07 ***
## light_decelerations_norm 0.353371 0.174664 2.023 0.04306 *
## severe_decelerations_norm 0.215107 0.067125 3.205 0.00135 **
## prolongued_decelerations_norm 1.621264 0.156990 10.327 < 2e-16 ***
## abnormal_short_term_variability_norm 1.245358 0.155823 7.992 1.33e-15 ***
## mean_value_of_st_var_norm -0.608821 0.213369 -2.853 0.00433 **
## perc_of_time_w_abn_long_term_var_norm 0.646350 0.146970 4.398 1.09e-05 ***
## mean_value_of_lt_var_norm 0.161473 0.177934 0.907 0.36415
## histogram_width_norm -0.638153 0.240508 -2.653 0.00797 **
## histogram_max_norm 1.332242 0.240150 5.548 2.90e-08 ***
## histogram_number_piks_norm 0.146322 0.152558 0.959 0.33750
## histogram_number_zeros_norm 0.122743 0.094123 1.304 0.19221
## histogram_tendency_norm 0.252067 0.142036 1.775 0.07595 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1799.55 on 1700 degrees of freedom
## Residual deviance: 688.04 on 1684 degrees of freedom
## AIC: 722.04
##
## Number of Fisher Scoring iterations: 7
tab=table(prob.test>0.5,FetusTest$fetal_health)
tab
##
## Healthy Warning
## FALSE 315 20
## TRUE 16 74
What can we get from this? Well, first that not all the variables seem statistically significative in our analysis. At the same time, we can say that there are some that obivusoly have a lot to do with our problem like: baseline value, accelerations,prolongued decelerations, abnormal short term variability, histogram max (these are significative at 99% of confidence level). At the same time, the interpretation is not so straightforward, as this is not a linear problem. We can say, for instance, that if we increase the baseline value in “1” unit, the logarith of the odds would decrease by 1.8 aproximatly (if all the rest stays untouched). If we take the exponential of this value we would get the odds:
exp(-1.7993)
## [1] 0.1654146
This means if we increase the baseline value by 1 unit, we would decrease our chances of getting a “warning” fetus by 0.1654.
Now, in order to have a better look at out models, we will use CARET. CARET helps us a lot, because not only we can use almost the same structure of code, but also because we can access to a lot of things inside of it.
I will repeat 5 times a 10-fold cross-validation to search for good hiperparameters for the training and testing set. This is: I will divide the data in 10 parts, 9 for training and 1 for testing. And this will be repeated 5 times.
ctrl <- trainControl(method = "repeatedcv",
repeats = 5,
number = 10)
lrFit <- train(fetal_health ~ .,
method = "glmnet",
family = "binomial",
data = FetusTrain,
tuneGrid = expand.grid(alpha = seq(0, 2, 0.1), lambda = seq(0, .1, 0.01)),
metric = "Accuracy",
trControl = ctrl)
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
print(lrFit)
## glmnet
##
## 1701 samples
## 16 predictor
## 2 classes: 'Healthy', 'Warning'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 1531, 1531, 1531, 1531, 1530, 1531, ...
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.0 0.00 0.9027687 0.7002332
## 0.0 0.01 0.9027687 0.7002332
## 0.0 0.02 0.9027687 0.7002332
## 0.0 0.03 0.9018323 0.6943994
## 0.0 0.04 0.9013617 0.6895479
## 0.0 0.05 0.8993659 0.6795598
## 0.0 0.06 0.8970170 0.6697889
## 0.0 0.07 0.8953693 0.6619510
## 0.0 0.08 0.8956053 0.6609502
## 0.0 0.09 0.8941907 0.6548070
## 0.0 0.10 0.8932516 0.6497366
## 0.1 0.00 0.9019535 0.7091795
## 0.1 0.01 0.9042961 0.7108125
## 0.1 0.02 0.9031217 0.7017375
## 0.1 0.03 0.9037140 0.6994385
## 0.1 0.04 0.9019507 0.6909376
## 0.1 0.05 0.8992489 0.6793672
## 0.1 0.06 0.8968987 0.6688783
## 0.1 0.07 0.8965430 0.6654162
## 0.1 0.08 0.8963077 0.6616518
## 0.1 0.09 0.8954848 0.6568759
## 0.1 0.10 0.8936073 0.6470499
## 0.2 0.00 0.9020704 0.7095969
## 0.2 0.01 0.9042975 0.7104338
## 0.2 0.02 0.9035978 0.7031078
## 0.2 0.03 0.9039528 0.7002566
## 0.2 0.04 0.9013638 0.6893074
## 0.2 0.05 0.8997188 0.6808417
## 0.2 0.06 0.8981922 0.6726663
## 0.2 0.07 0.8960800 0.6616337
## 0.2 0.08 0.8939630 0.6518913
## 0.2 0.09 0.8904294 0.6354070
## 0.2 0.10 0.8885436 0.6247200
## 0.3 0.00 0.9020704 0.7095969
## 0.3 0.01 0.9039466 0.7093201
## 0.3 0.02 0.9041881 0.7046042
## 0.3 0.03 0.9030116 0.6979258
## 0.3 0.04 0.9013659 0.6886981
## 0.3 0.05 0.8996019 0.6791405
## 0.3 0.06 0.8953686 0.6596953
## 0.3 0.07 0.8919623 0.6450420
## 0.3 0.08 0.8901934 0.6343273
## 0.3 0.09 0.8892523 0.6278497
## 0.3 0.10 0.8845477 0.6071999
## 0.4 0.00 0.9023050 0.7103593
## 0.4 0.01 0.9043003 0.7102521
## 0.4 0.02 0.9034857 0.7030877
## 0.4 0.03 0.9026628 0.6959440
## 0.4 0.04 0.8999555 0.6823326
## 0.4 0.05 0.8964288 0.6659377
## 0.4 0.06 0.8921955 0.6473214
## 0.4 0.07 0.8904281 0.6365583
## 0.4 0.08 0.8887817 0.6277241
## 0.4 0.09 0.8846605 0.6076780
## 0.4 0.10 0.8797317 0.5832499
## 0.5 0.00 0.9024220 0.7107597
## 0.5 0.01 0.9044172 0.7104104
## 0.5 0.02 0.9038352 0.7037496
## 0.5 0.03 0.9019590 0.6927711
## 0.5 0.04 0.8988974 0.6773249
## 0.5 0.05 0.8943063 0.6570831
## 0.5 0.06 0.8910163 0.6406343
## 0.5 0.07 0.8885422 0.6282954
## 0.5 0.08 0.8841878 0.6070087
## 0.5 0.09 0.8781967 0.5794174
## 0.5 0.10 0.8698561 0.5398882
## 0.6 0.00 0.9023036 0.7104730
## 0.6 0.01 0.9041833 0.7099289
## 0.6 0.02 0.9025410 0.6991622
## 0.6 0.03 0.8993714 0.6830886
## 0.6 0.04 0.8963063 0.6669548
## 0.6 0.05 0.8930094 0.6510749
## 0.6 0.06 0.8900737 0.6360604
## 0.6 0.07 0.8865422 0.6180927
## 0.6 0.08 0.8774943 0.5791677
## 0.6 0.09 0.8689149 0.5383804
## 0.6 0.10 0.8596331 0.4934755
## 0.7 0.00 0.9023036 0.7104730
## 0.7 0.01 0.9028912 0.7061587
## 0.7 0.02 0.9026614 0.6990488
## 0.7 0.03 0.8984268 0.6801758
## 0.7 0.04 0.8948952 0.6619065
## 0.7 0.05 0.8927748 0.6490714
## 0.7 0.06 0.8905395 0.6356636
## 0.7 0.07 0.8817200 0.5983341
## 0.7 0.08 0.8716160 0.5519663
## 0.7 0.09 0.8604566 0.4996763
## 0.7 0.10 0.8498759 0.4470472
## 0.8 0.00 0.9023036 0.7104730
## 0.8 0.01 0.9027757 0.7061534
## 0.8 0.02 0.9030144 0.7001653
## 0.8 0.03 0.8952482 0.6693717
## 0.8 0.04 0.8936011 0.6565372
## 0.8 0.05 0.8920675 0.6445889
## 0.8 0.06 0.8867796 0.6204334
## 0.8 0.07 0.8761960 0.5748795
## 0.8 0.08 0.8647992 0.5215140
## 0.8 0.09 0.8515202 0.4557584
## 0.8 0.10 0.8400044 0.3946768
## 0.9 0.00 0.9023036 0.7104730
## 0.9 0.01 0.9033625 0.7078255
## 0.9 0.02 0.9005444 0.6924654
## 0.9 0.03 0.8932516 0.6622416
## 0.9 0.04 0.8933637 0.6554107
## 0.9 0.05 0.8910073 0.6395601
## 0.9 0.06 0.8807816 0.5970709
## 0.9 0.07 0.8703163 0.5480860
## 0.9 0.08 0.8564572 0.4825049
## 0.9 0.09 0.8443533 0.4186734
## 0.9 0.10 0.8311807 0.3441856
## 1.0 0.00 0.9023036 0.7104730
## 1.0 0.01 0.9041868 0.7102631
## 1.0 0.02 0.8981915 0.6847333
## 1.0 0.03 0.8934855 0.6625510
## 1.0 0.04 0.8927741 0.6527032
## 1.0 0.05 0.8877111 0.6271507
## 1.0 0.06 0.8786639 0.5869390
## 1.0 0.07 0.8651487 0.5242996
## 1.0 0.08 0.8495208 0.4471913
## 1.0 0.09 0.8341275 0.3640001
## 1.0 0.10 0.8220103 0.2891412
## 1.1 0.00 0.9023036 0.7104730
## 1.1 0.01 0.9041868 0.7102631
## 1.1 0.02 0.8981915 0.6847333
## 1.1 0.03 0.8934855 0.6625510
## 1.1 0.04 0.8927741 0.6527032
## 1.1 0.05 0.8877111 0.6271507
## 1.1 0.06 0.8786639 0.5869390
## 1.1 0.07 0.8651487 0.5242996
## 1.1 0.08 0.8495208 0.4471913
## 1.1 0.09 0.8341275 0.3640001
## 1.1 0.10 0.8220103 0.2891412
## 1.2 0.00 0.9023036 0.7104730
## 1.2 0.01 0.9041868 0.7102631
## 1.2 0.02 0.8981915 0.6847333
## 1.2 0.03 0.8934855 0.6625510
## 1.2 0.04 0.8927741 0.6527032
## 1.2 0.05 0.8877111 0.6271507
## 1.2 0.06 0.8786639 0.5869390
## 1.2 0.07 0.8651487 0.5242996
## 1.2 0.08 0.8495208 0.4471913
## 1.2 0.09 0.8341275 0.3640001
## 1.2 0.10 0.8220103 0.2891412
## 1.3 0.00 0.9023036 0.7104730
## 1.3 0.01 0.9041868 0.7102631
## 1.3 0.02 0.8981915 0.6847333
## 1.3 0.03 0.8934855 0.6625510
## 1.3 0.04 0.8927741 0.6527032
## 1.3 0.05 0.8877111 0.6271507
## 1.3 0.06 0.8786639 0.5869390
## 1.3 0.07 0.8651487 0.5242996
## 1.3 0.08 0.8495208 0.4471913
## 1.3 0.09 0.8341275 0.3640001
## 1.3 0.10 0.8220103 0.2891412
## 1.4 0.00 0.9023036 0.7104730
## 1.4 0.01 0.9041868 0.7102631
## 1.4 0.02 0.8981915 0.6847333
## 1.4 0.03 0.8934855 0.6625510
## 1.4 0.04 0.8927741 0.6527032
## 1.4 0.05 0.8877111 0.6271507
## 1.4 0.06 0.8786639 0.5869390
## 1.4 0.07 0.8651487 0.5242996
## 1.4 0.08 0.8495208 0.4471913
## 1.4 0.09 0.8341275 0.3640001
## 1.4 0.10 0.8220103 0.2891412
## 1.5 0.00 0.9023036 0.7104730
## 1.5 0.01 0.9041868 0.7102631
## 1.5 0.02 0.8981915 0.6847333
## 1.5 0.03 0.8934855 0.6625510
## 1.5 0.04 0.8927741 0.6527032
## 1.5 0.05 0.8877111 0.6271507
## 1.5 0.06 0.8786639 0.5869390
## 1.5 0.07 0.8651487 0.5242996
## 1.5 0.08 0.8495208 0.4471913
## 1.5 0.09 0.8341275 0.3640001
## 1.5 0.10 0.8220103 0.2891412
## 1.6 0.00 0.9023036 0.7104730
## 1.6 0.01 0.9041868 0.7102631
## 1.6 0.02 0.8981915 0.6847333
## 1.6 0.03 0.8934855 0.6625510
## 1.6 0.04 0.8927741 0.6527032
## 1.6 0.05 0.8877111 0.6271507
## 1.6 0.06 0.8786639 0.5869390
## 1.6 0.07 0.8651487 0.5242996
## 1.6 0.08 0.8495208 0.4471913
## 1.6 0.09 0.8341275 0.3640001
## 1.6 0.10 0.8220103 0.2891412
## 1.7 0.00 0.9023036 0.7104730
## 1.7 0.01 0.9041868 0.7102631
## 1.7 0.02 0.8981915 0.6847333
## 1.7 0.03 0.8934855 0.6625510
## 1.7 0.04 0.8927741 0.6527032
## 1.7 0.05 0.8877111 0.6271507
## 1.7 0.06 0.8786639 0.5869390
## 1.7 0.07 0.8651487 0.5242996
## 1.7 0.08 0.8495208 0.4471913
## 1.7 0.09 0.8341275 0.3640001
## 1.7 0.10 0.8220103 0.2891412
## 1.8 0.00 0.9023036 0.7104730
## 1.8 0.01 0.9041868 0.7102631
## 1.8 0.02 0.8981915 0.6847333
## 1.8 0.03 0.8934855 0.6625510
## 1.8 0.04 0.8927741 0.6527032
## 1.8 0.05 0.8877111 0.6271507
## 1.8 0.06 0.8786639 0.5869390
## 1.8 0.07 0.8651487 0.5242996
## 1.8 0.08 0.8495208 0.4471913
## 1.8 0.09 0.8341275 0.3640001
## 1.8 0.10 0.8220103 0.2891412
## 1.9 0.00 0.9023036 0.7104730
## 1.9 0.01 0.9041868 0.7102631
## 1.9 0.02 0.8981915 0.6847333
## 1.9 0.03 0.8934855 0.6625510
## 1.9 0.04 0.8927741 0.6527032
## 1.9 0.05 0.8877111 0.6271507
## 1.9 0.06 0.8786639 0.5869390
## 1.9 0.07 0.8651487 0.5242996
## 1.9 0.08 0.8495208 0.4471913
## 1.9 0.09 0.8341275 0.3640001
## 1.9 0.10 0.8220103 0.2891412
## 2.0 0.00 0.9023036 0.7104730
## 2.0 0.01 0.9041868 0.7102631
## 2.0 0.02 0.8981915 0.6847333
## 2.0 0.03 0.8934855 0.6625510
## 2.0 0.04 0.8927741 0.6527032
## 2.0 0.05 0.8877111 0.6271507
## 2.0 0.06 0.8786639 0.5869390
## 2.0 0.07 0.8651487 0.5242996
## 2.0 0.08 0.8495208 0.4471913
## 2.0 0.09 0.8341275 0.3640001
## 2.0 0.10 0.8220103 0.2891412
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.5 and lambda = 0.01.
We have two hiperparamiters: one controlls the lasue, and the other one the ridge. For each combination of hiperparamiters, the computer is repeating it 5 times and computing the accuracy.
lrPred = predict(lrFit, FetusTest)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 309 14
## Warning 22 80
##
## Accuracy : 0.9153
## 95% CI : (0.8847, 0.94)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 5.46e-14
##
## Kappa : 0.7614
##
## Mcnemar's Test P-Value : 0.2433
##
## Sensitivity : 0.9335
## Specificity : 0.8511
## Pos Pred Value : 0.9567
## Neg Pred Value : 0.7843
## Prevalence : 0.7788
## Detection Rate : 0.7271
## Detection Prevalence : 0.7600
## Balanced Accuracy : 0.8923
##
## 'Positive' Class : Healthy
##
Several things to point out.
The accuracy here is 0.9082, and the kappa is 0.7263. These look like pretty good results overall. It is highly probable we do get a better result with logistic regression than the beyesian classificators, as all of them assume that our data comes from a multinormal. We know that some of our data works as a categorical, ordinal variable. This is going to give us not very good results with the bayesian classificators, as Logistic regression does not assume any prior probability.
Specificity is 0.7553. This is important, as is the error that worries me the most. As one would imagine, it would be much worse to diagnose an unhelathy fetus as healthy, than the opposite. So we are much more interested on reducing this error, than to diagnose a healthy baby as unhealthy. Nevertheless, we still have to be causious about the second error, as we do care if we diagnose a fetus as unhealthy, even though it is realy healthy. This is because this error can cause a lot of consequences: the mother may want an abortion, it may cause depression, and more. Nevertheless sensitivity is more than 0.95, that is great.
I am, indeed, willing to lose some accuracy in favor of gaining some more precision in not comitting this error that worries me so much. In this case, out of the 425 fetus of the testing set, 23 of them would probably die and maybe kill their mothers because I diagnosed them as healthy. Even if 5% does not sound like a big mistake, if I consider this as life that could be saved, I would be willing to change my goal and chase a smaller worst mistake. I will discuss this later.
Let´s plot the importance of each variable.
lr_imp <- varImp(lrFit, scale = F)
plot(lr_imp, scales = list(y = list(cex = .95)))
We can see that the most important variable here is the prolongued decelerations, followed by abnormal short term variability.
Let´s check the ROC Curve. This is a very useful tool, as it allows us to evaluate all possible thresholds for splitting predicted probabilities into predicted classes.
lrPred <- predict(lrFit,FetusTest,type="prob")
colAUC(lrPred, FetusTest[["fetal_health"]], plotROC = TRUE)
## Healthy Warning
## Healthy vs. Warning 0.9635212 0.9635212
As we can see, this summarizes the performance across all thresholds. When the area under the curve is 0.5 it referes to a random guess (it´s like having a kappa equal to zero). If the AUC is 1, we are facing a model that always predicts right. As we can see now, the area under the curve here is almost 0.96. This is a very good indication of the performance of our model.
Now, it´s time to try and fit a model with linear discriminany analysis. I will try this with 3 differents methods (or techinques): sparseLDA, regular LDA and StepLDA. I will choose the one with the best performance, but also taking into account if there are big differenecs in the error proportions (shown in the confusion matrix).
First, we will perform LDA Sparse. We can tune the hyperparameters and we can even check the combinations of predictos-parameters, to make a good decision.
LDAGrid <- expand.grid(.NumVars = c(2:15),
.lambda = c(0, 0.01, 0.1, 1, 10, 100))
ldasparseFit <- train(fetal_health ~ .,
method = "sparseLDA",
data = FetusTrain,
tuneGrid = LDAGrid,
metric = "Accuracy",
trControl = ctrl)
ldaPred = predict(ldasparseFit, FetusTest)
confusionMatrix(ldaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 301 15
## Warning 30 79
##
## Accuracy : 0.8941
## 95% CI : (0.8609, 0.9217)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 4.358e-10
##
## Kappa : 0.7093
##
## Mcnemar's Test P-Value : 0.03689
##
## Sensitivity : 0.9094
## Specificity : 0.8404
## Pos Pred Value : 0.9525
## Neg Pred Value : 0.7248
## Prevalence : 0.7788
## Detection Rate : 0.7082
## Detection Prevalence : 0.7435
## Balanced Accuracy : 0.8749
##
## 'Positive' Class : Healthy
##
plot(ldasparseFit)
ldasparseFit$bestTune
## NumVars lambda
## 66 12 100
Here we can see that our best desition would be to leave with 15 predictors and a lambda=100. Nevertheless, we want to compare models, and would like to stay with the same amount of predictors for all of them (16). At the same time, we see that we have a lot of accuracy when we take a few variables (seems like two predictors is a good idea for any value of lambda).
Again, CARET will find the hyperparameters to maximize our accuracy.
ldasparseFit <- train(fetal_health ~ .,
method = "sparseLDA",
data = FetusTrain,
metric = "Accuracy",
trControl = ctrl)
ldaPred = predict(ldasparseFit, FetusTest)
confusionMatrix(ldaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 305 15
## Warning 26 79
##
## Accuracy : 0.9035
## 95% CI : (0.8714, 0.9299)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 1.049e-11
##
## Kappa : 0.7312
##
## Mcnemar's Test P-Value : 0.1183
##
## Sensitivity : 0.9215
## Specificity : 0.8404
## Pos Pred Value : 0.9531
## Neg Pred Value : 0.7524
## Prevalence : 0.7788
## Detection Rate : 0.7176
## Detection Prevalence : 0.7529
## Balanced Accuracy : 0.8809
##
## 'Positive' Class : Healthy
##
ldasparseFit$bestTune
## NumVars lambda
## 8 16 1e-04
ldastepFit <- train(fetal_health ~ .,
method = "stepLDA",
data = FetusTrain,
metric = "Accuracy",
trControl = ctrl)
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82561; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.146
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82636; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.073
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83137; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.024
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82876; in: "mean_value_of_st_var_norm"; variables (1): mean_value_of_st_var_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.077
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82755; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.00 0.00 2.07
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.8268; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.056
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82624; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.046
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83072; in: "perc_of_time_w_abn_long_term_var_norm"; variables (1): perc_of_time_w_abn_long_term_var_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.037
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82443; in: "mean_value_of_st_var_norm"; variables (1): mean_value_of_st_var_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.045
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82637; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.354
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.8268; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.00 0.00 2.04
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82952; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.095
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82614; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.041
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.8281; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.096
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82758; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.029
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83032; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.166
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82757; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.395
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82505; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.351
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82311; in: "mean_value_of_st_var_norm"; variables (1): mean_value_of_st_var_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.447
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82941; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.044
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82614; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.077
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83085; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.00 0.00 2.02
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82637; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.039
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82626; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.077
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82484; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.026
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82549; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.056
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83029; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.00 0.00 2.05
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82624; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.096
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82431; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.035
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82953; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.003
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82638; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.055
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82953; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.127
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82952; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.243
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82876; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.438
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82832; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.0 0.0 2.1
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82559; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.147
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82549; in: "mean_value_of_st_var_norm"; variables (1): mean_value_of_st_var_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.402
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82745; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.188
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.8281; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.116
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82576; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.083
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83017; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.00 0.00 2.02
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82822; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.062
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.8281; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.031
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82756; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.138
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82431; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.128
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82757; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.00 0.00 2.09
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82941; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.059
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82506; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.00 0.00 2.09
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82876; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.146
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82442; in: "mean_value_of_st_var_norm"; variables (1): mean_value_of_st_var_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.203
## `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1701 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82718; in: "abnormal_short_term_variability_norm"; variables (1): abnormal_short_term_variability_norm
##
## hr.elapsed min.elapsed sec.elapsed
## 0.000 0.000 2.365
ldaPred = predict(ldastepFit, FetusTest)
confusionMatrix(ldaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 309 53
## Warning 22 41
##
## Accuracy : 0.8235
## 95% CI : (0.7839, 0.8586)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 0.013661
##
## Kappa : 0.4192
##
## Mcnemar's Test P-Value : 0.000532
##
## Sensitivity : 0.9335
## Specificity : 0.4362
## Pos Pred Value : 0.8536
## Neg Pred Value : 0.6508
## Prevalence : 0.7788
## Detection Rate : 0.7271
## Detection Prevalence : 0.8518
## Balanced Accuracy : 0.6849
##
## 'Positive' Class : Healthy
##
ldaFit <- train(fetal_health ~ .,
method = "lda",
data = FetusTrain,
metric = "Accuracy",
trControl = ctrl)
ldaPred = predict(ldaFit, FetusTest)
confusionMatrix(ldaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 305 14
## Warning 26 80
##
## Accuracy : 0.9059
## 95% CI : (0.874, 0.9319)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 3.872e-12
##
## Kappa : 0.7388
##
## Mcnemar's Test P-Value : 0.08199
##
## Sensitivity : 0.9215
## Specificity : 0.8511
## Pos Pred Value : 0.9561
## Neg Pred Value : 0.7547
## Prevalence : 0.7788
## Detection Rate : 0.7176
## Detection Prevalence : 0.7506
## Balanced Accuracy : 0.8863
##
## 'Positive' Class : Healthy
##
Let´s plot each ROC plot and asses there Area under the Curve.
ldasparsePred <- predict(ldasparseFit,FetusTest,type="prob")
colAUC(ldasparsePred, FetusTest[["fetal_health"]], plotROC = TRUE)
## Healthy Warning
## Healthy vs. Warning 0.9623321 0.9623321
ldastepPred <- predict(ldastepFit,FetusTest,type="prob")
colAUC(ldastepPred, FetusTest[["fetal_health"]], plotROC = TRUE)
## Healthy Warning
## Healthy vs. Warning 0.8464518 0.8464518
ldaPred <- predict(ldaFit,FetusTest,type="prob")
colAUC(ldaPred, FetusTest[["fetal_health"]], plotROC = TRUE)
## Healthy Warning
## Healthy vs. Warning 0.9623321 0.9623321
Betweem the three models, i will stick to the third one (method=lda), as its area under the curve is 0.95 (in comparison with 0.87 and 0.83 in the other two models). At the same time, we get the best accuracy and kappa (accuracy = 0.9012, kappa = 0.7196). Also, although the lda model does not have the biggest sensitivity of all (as the step method has a bigger value), it has the highest score of specificity (and this is the error we care the most). So we will stick to the third model.
qdaFit <- train(fetal_health ~ .,
method = "qda",
data = FetusTrain,
metric = "Accuracy",
trControl = ctrl)
## Warning: model fit failed for Fold08.Rep1: parameter=none Error in qda.default(x, grouping, ...) : rank deficiency in group Healthy
## Warning: model fit failed for Fold06.Rep2: parameter=none Error in qda.default(x, grouping, ...) : rank deficiency in group Healthy
## Warning: model fit failed for Fold04.Rep3: parameter=none Error in qda.default(x, grouping, ...) : rank deficiency in group Healthy
## Warning: model fit failed for Fold03.Rep4: parameter=none Error in qda.default(x, grouping, ...) : rank deficiency in group Healthy
## Warning: model fit failed for Fold03.Rep5: parameter=none Error in qda.default(x, grouping, ...) : rank deficiency in group Healthy
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
qdaPred = predict(qdaFit, FetusTest)
confusionMatrix(qdaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 315 21
## Warning 16 73
##
## Accuracy : 0.9129
## 95% CI : (0.882, 0.938)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 1.654e-13
##
## Kappa : 0.7424
##
## Mcnemar's Test P-Value : 0.5108
##
## Sensitivity : 0.9517
## Specificity : 0.7766
## Pos Pred Value : 0.9375
## Neg Pred Value : 0.8202
## Prevalence : 0.7788
## Detection Rate : 0.7412
## Detection Prevalence : 0.7906
## Balanced Accuracy : 0.8641
##
## 'Positive' Class : Healthy
##
When we run the QDA Model it comes with a problem (an error). This is because we have very unbalanced classes and we have some variables that almost does not change. This is interesting, but yet problematic. I knew this could become a problem, as there are a couple of variables with almost non variability. It should not be a surprise when we are trying to fit a quadratic model, in data that we can picture as a line. In detail, what is happening is that in one of the folds, there is a group (“Healthy”) that presents a variable with no variation, and thus it becomes impossible to invert the correlation matrix (singularity problem). The fact that the “problematic” group is the healthy one makes sense, as there are some variables in which most of the data is equal to zero for the healthy babies (like “severe decelerations”).
All in all, I will keep the results of this model, because out from this issue, it predicts fairly good. The accuracy is 0.9012 and the kappa is 0.697. It has a huge sensitivity value (0.9577) and an “acceptable” (0.7021) specificity value.
naiveFit <- train(fetal_health ~ .,
method = "naive_bayes",
data = FetusTrain,
metric = "Accuracy")
ldaPred = predict(naiveFit, FetusTest)
confusionMatrix(ldaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 303 11
## Warning 28 83
##
## Accuracy : 0.9082
## 95% CI : (0.8767, 0.9339)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 1.392e-12
##
## Kappa : 0.7498
##
## Mcnemar's Test P-Value : 0.01041
##
## Sensitivity : 0.9154
## Specificity : 0.8830
## Pos Pred Value : 0.9650
## Neg Pred Value : 0.7477
## Prevalence : 0.7788
## Detection Rate : 0.7129
## Detection Prevalence : 0.7388
## Balanced Accuracy : 0.8992
##
## 'Positive' Class : Healthy
##
Naive Bayes is a faily good model too, as it performs pretty alright in predicting. Accuracy is 0.8824 and and kappa is 0.6661. Our less worring mistake in predicting meassued by sensivility is 0.9154 and specificity is 0.7660. We still make a lot of expensive mistakes.
In general, if we are worried about accuracy of predictions, then we would stick (from the models performed above) with our logistic regression model, as it has the higest accuracy and kappa values. Although our lda model is also very good, almost predicts as good as our logistic model. Nevertheless, I was expecting this, as one good thing about logistic regression model is that it does not assume any prior probabilistic distribution for the variables, while the bayes classificators are based on assuming normality. Even though I transformed all my variables to make them behave “closer” to a gaussian, we know that there are some that have still a lot of skewness (are not simetrical). Also, we have some variables that don´t have many different values and this is not a problem with logistic regression.
Let´s change the theshold, because as we discussed before, I want to put more weight on the mistake of diagnosing as a helthy fetus, one that is actually sick. So by changing the threshold from 0.5 to 0.4, for example, we will increase our cheap mistake (diagnosing unhelathy a healthy fetus) and decrease our expensive mistake (diagnosing healthy an unhealthy fetus).So we will increase one, in exchange of decreasing the other one we will be loosing accuracy in our prediction. From 0.5 to 0.4, the fetus is going to be diagnosed as “warning” if the probability is bigger than 40%. We know that any move from 50%-50% of the threshold will make us lose accuracy, as that is the threshold that maximizes accuracy.
Until now, we had left CARET estimate a good value for the threshold, now I will estimate the best value of it, taking into account the weight of each cost. To do this I proceed to build a relative cost matrix.
| Prediction/Reference | Healthy | Warning |
|---|---|---|
| Healthy | 0 | 2 |
| Warning | 0.5 | 1 |
relative.cost <- c(0, 0.5, 2, 1)
The biggest cost is determined with a weight of double the other mistake (as it is much more important not to make the worst mistake in comparison to the “not so bad” mistake). We have to be more sure that a fetus is healthy before classifying it as “healthy”.
Let´s now calculate a good threshold for our models, but optimizing this new decision rule based on the weight of each cost.
cost.i = matrix(NA, nrow = 30, ncol = 9)
# 30 replicates for training/testing sets for each of the 10 values of threshold
j <- 0
ctrl <- trainControl(method = "none")
for (threshold in seq(0.25,0.65,0.05)){
j <- j + 1
cat(j)
for(i in 1:30){
# partition data intro training (80%) and testing sets (20%)
d <- createDataPartition(FetusTrain$fetal_health, p = 0.8, list = FALSE)
# select training sample
train<-FetusTrain[d,]
test <-FetusTrain[-d,]
ldaFit <- train(fetal_health ~ .,
method = "lda",
data = train,
metric = "Accuracy",
trControl = ctrl)
lrProb = predict(ldaFit, test, type="prob")
lrPred = rep("Healthy", nrow(test))
lrPred[which(lrProb[,2] > threshold)] = "Warning"
lrPred = factor(lrPred)
CM = confusionMatrix(lrPred, test$fetal_health)$table
cost.i[i,j] <- sum(relative.cost*CM)/nrow(test) # unitary cost
}
}
## 123456789
Now, I present a boxplot to check the distribution of each threshold value according to the cost the produce. This will give me the possibility of chosing the threshold that minimizes my cost (multiplying the cost matrix with the confusion matrix).
boxplot(cost.i, main = "Hyper-parameter selection",
ylab = "cost",
xlab = "threshold value",names = seq(0.25,0.65,0.05),col="royalblue2")
Instead of maximizing the accuracy we are minimizing the cost. And we can see in the boxplot that the threshold value that corresponds to 0.3 looks very low. Let´s try with this threshold.
# optimal threshold
threshold = 0.3
# final prediction
ldaFit <- train(fetal_health ~ .,
#method = "lda",
data = FetusTrain,
metric = "Accuracy",
trControl = ctrl)
ldaPred = predict(ldaFit, FetusTest)
lrProb = predict(ldaFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(lrProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 302 3
## Warning 29 91
##
## Accuracy : 0.9247
## 95% CI : (0.8954, 0.9479)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 4.783e-16
##
## Kappa : 0.8011
##
## Mcnemar's Test P-Value : 9.897e-06
##
## Sensitivity : 0.9124
## Specificity : 0.9681
## Pos Pred Value : 0.9902
## Neg Pred Value : 0.7583
## Prevalence : 0.7788
## Detection Rate : 0.7106
## Detection Prevalence : 0.7176
## Balanced Accuracy : 0.9402
##
## 'Positive' Class : Healthy
##
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2623529
As we can see, now that we use a threshold of 0.3, we get: our specificity is now 0.9255 and we make less “worst mistakes” than before. Indeed, out of 425 fetus of the testing set, we only diagnose as healthy 7 fetus that are really un healthy. Our relative cost is of 0.267.
cost.i = matrix(NA, nrow = 30, ncol = 9)
# 30 replicates for training/testing sets for each of the 10 values of threshold
j <- 0
ctrl <- trainControl(method = "none")
for (threshold in seq(0.25,0.65,0.05)){
j <- j + 1
cat(j)
for(i in 1:30){
# partition data intro training (80%) and testing sets (20%)
d <- createDataPartition(FetusTrain$fetal_health, p = 0.8, list = FALSE)
# select training sample
train<-FetusTrain[d,]
test <-FetusTrain[-d,]
ldaFit <- train(fetal_health ~ .,
method = "naive_bayes",
data = train,
metric = "Accuracy")
lrProb = predict(ldaFit, test, type="prob")
lrPred = rep("Healthy", nrow(test))
lrPred[which(lrProb[,2] > threshold)] = "Warning"
lrPred = factor(lrPred)
CM = confusionMatrix(lrPred, test$fetal_health)$table
cost.i[i,j] <- sum(relative.cost*CM)/nrow(test) # unitary cost
}
}
## 123456789
Now, I will plot the different thresholds with its corresponding hyperparameter.
boxplot(cost.i, main = "Hyper-parameter selection",
ylab = "cost",
xlab = "threshold value",names = seq(0.25,0.65,0.05),col="royalblue2")
It looks like 0.4 is a good threshold to minimize my cost. Let´s check the confusion matrix for our new model.
# optimal threshold
threshold = 0.4
# final prediction
naivebayFit <- train(fetal_health ~ .,
method = "naive_bayes",
data = FetusTrain,
metric = "Accuracy")
qdaPred = predict(naivebayFit, FetusTest)
lrProb = predict(naivebayFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(lrProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 300 10
## Warning 31 84
##
## Accuracy : 0.9035
## 95% CI : (0.8714, 0.9299)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 1.049e-11
##
## Kappa : 0.7407
##
## Mcnemar's Test P-Value : 0.001787
##
## Sensitivity : 0.9063
## Specificity : 0.8936
## Pos Pred Value : 0.9677
## Neg Pred Value : 0.7304
## Prevalence : 0.7788
## Detection Rate : 0.7059
## Detection Prevalence : 0.7294
## Balanced Accuracy : 0.9000
##
## 'Positive' Class : Healthy
##
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2811765
We did improve our worst error in comparison to our naive bayes when we were trying to maximize accuracy. Our relative error is: 0.305 and now out of 425 we only predict 21 fetus that are unhealthy as healthy.
I can´t run the same formula for the QDA model, as there is Rank defficiancy in some cases. In this case, I will just try different values myself and check which would bring the lowest relative cost or highest specificity.
# optimal threshold
threshold = 0.3
# final prediction
qdaFit <- train(fetal_health ~ .,
method = "qda",
data = FetusTrain,
metric = "Accuracy",
trControl = ctrl)
qdaPred = predict(qdaFit, FetusTest)
qdProb = predict(qdaFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(qdProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 308 17
## Warning 23 77
##
## Accuracy : 0.9059
## 95% CI : (0.874, 0.9319)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 3.872e-12
##
## Kappa : 0.7329
##
## Mcnemar's Test P-Value : 0.4292
##
## Sensitivity : 0.9305
## Specificity : 0.8191
## Pos Pred Value : 0.9477
## Neg Pred Value : 0.7700
## Prevalence : 0.7788
## Detection Rate : 0.7247
## Detection Prevalence : 0.7647
## Balanced Accuracy : 0.8748
##
## 'Positive' Class : Healthy
##
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2882353
Now, let´s try with threshold: 0.4
# optimal threshold
threshold = 0.4
# final prediction
qdaFit <- train(fetal_health ~ .,
method = "qda",
data = FetusTrain,
metric = "Accuracy",
trControl = ctrl)
qdaPred = predict(qdaFit, FetusTest)
qdProb = predict(qdaFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(qdProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 314 18
## Warning 17 76
##
## Accuracy : 0.9176
## 95% CI : (0.8873, 0.942)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 1.75e-14
##
## Kappa : 0.76
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9486
## Specificity : 0.8085
## Pos Pred Value : 0.9458
## Neg Pred Value : 0.8172
## Prevalence : 0.7788
## Detection Rate : 0.7388
## Detection Prevalence : 0.7812
## Balanced Accuracy : 0.8786
##
## 'Positive' Class : Healthy
##
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2835294
# optimal threshold
threshold = 0.35
# final prediction
qdaFit <- train(fetal_health ~ .,
method = "qda",
data = FetusTrain,
metric = "Accuracy",
trControl = ctrl)
qdaPred = predict(qdaFit, FetusTest)
qdProb = predict(qdaFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(qdProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 312 17
## Warning 19 77
##
## Accuracy : 0.9153
## 95% CI : (0.8847, 0.94)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 5.46e-14
##
## Kappa : 0.756
##
## Mcnemar's Test P-Value : 0.8676
##
## Sensitivity : 0.9426
## Specificity : 0.8191
## Pos Pred Value : 0.9483
## Neg Pred Value : 0.8021
## Prevalence : 0.7788
## Detection Rate : 0.7341
## Detection Prevalence : 0.7741
## Balanced Accuracy : 0.8809
##
## 'Positive' Class : Healthy
##
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2835294
Between the three models that I tried, I would keep the third one (threshold equal to 0.35), as the cost is almost minimized, we don´t lose a lot of accuracy, but most of all, we do not make so many bad mistakes.
cost.i = matrix(NA, nrow = 30, ncol = 9)
# 30 replicates for training/testing sets for each of the 10 values of threshold
j <- 0
ctrl <- trainControl(method = "none")
for (threshold in seq(0.25,0.65,0.05)){
j <- j + 1
cat(j)
for(i in 1:30){
# partition data intro training (80%) and testing sets (20%)
d <- createDataPartition(FetusTrain$fetal_health, p = 0.8, list = FALSE)
# select training sample
train<-FetusTrain[d,]
test <-FetusTrain[-d,]
lrFit <- train(fetal_health ~ .,
method = "glmnet",
family = "binomial",
data = train,
metric = "Accuracy",
trControl = ctrl)
lrProb = predict(lrFit, test, type="prob")
lrPred = rep("Healthy", nrow(test))
lrPred[which(lrProb[,2] > threshold)] = "Warning"
lrPred = factor(lrPred)
CM = confusionMatrix(lrPred, test$fetal_health)$table
cost.i[i,j] <- sum(relative.cost*CM)/nrow(test) # unitary cost
}
}
## 123456789
boxplot(cost.i, main = "Hyper-parameter selection",
ylab = "cost",
xlab = "threshold value",names = seq(0.25,0.65,0.05),col="royalblue2")
# optimal threshold
threshold = 0.3
# final prediction
lrFit <- train(fetal_health ~ .,
method = "glmnet",
family = "binomial",
data = FetusTrain,
metric = "Accuracy",
trControl = ctrl)
lrPred = predict(lrFit, FetusTest)
lrProb = predict(lrFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(lrProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Healthy Warning
## Healthy 289 8
## Warning 42 86
##
## Accuracy : 0.8824
## 95% CI : (0.8479, 0.9114)
## No Information Rate : 0.7788
## P-Value [Acc > NIR] : 2.672e-08
##
## Kappa : 0.6977
##
## Mcnemar's Test P-Value : 3.058e-06
##
## Sensitivity : 0.8731
## Specificity : 0.9149
## Pos Pred Value : 0.9731
## Neg Pred Value : 0.6719
## Prevalence : 0.7788
## Detection Rate : 0.6800
## Detection Prevalence : 0.6988
## Balanced Accuracy : 0.8940
##
## 'Positive' Class : Healthy
##
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2894118
If we are trying to minimize the total error then we would have to take a threshold of 0.3. This way, our relative error would be of just 0.2894118. We can see how we do not make a lot of expensive mistakes anymore.
All in all, if we want to get our best model (from the ones we tried), but taking into account the cost that both mistakes and successes in predicting: if we want to minimize this cost we would stick to a lda model. This is interesting, as our conclusion is different than when we were trying to maximize accuracy. Now, in order to minimize our cost we should use a LDA. With my LDA here our minimum relative error is of 0.267, in comparison to the 0.2894 that our logistic regression model produces.